SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
WHITE PaPEr




QLogic TrueScale™ DDR IB Adapter Provides
Scalable, Best-In-Class Performance
        QLogic’s DDR Adapters                                     QLogic’s Message Rate 340% Better and
                                                                   Scalable Latency Up to 33% Superior
        Outperform Mellanox ®




                                                                      adapters over Mellanox ConnectX™ adapters. The findings in
Executive Summary
                                                                      this paper demonstrate that QLogic TrueScale adapters are the
Solving today’s most challenging computational problems requires
                                                                      best choice for High Performance Computing (HPC) applications.
more powerful, cost-effective, and power efficient systems. as
clusters and the number of processors per cluster grow to address
                                                                      Key Findings
problems of increasing complexity, the communication needs
of the applications also increase. Consequently, interconnect
                                                                      The QLogic 7200 Series DDr InfiniBand adapters offer better
performance is crucial for application scaling. Satisfying the high
                                                                      message and scalable latency than Mellanox’s ConnectX adapters.
performance requirements of Inter-Processor Communications
                                                                      The test results described in this paper suggest that:
(IPC) requires a interconnect that:
                                                                      • Message rate performance is over 340-percent better
•   Efficiently processes a variety of messages patterns
                                                                        than ConnectX
•   Leverages the benefits of multi-core processors
•   Scales with the size of the fabric                                • Scalable latency is up to 33 percent superior to ConnectX
•   Minimizes power requirements
                                                                      • TrueScale bandwidth performance is anywhere from 120
QLogic Host Channel adapters (HCas) have been architected with          to 70 percent better at 128- and 1024-byte message sizes,
these design goals in mind to provide significantly better scaling      respectively
performance than any other InfiniBand™ (IB) architecture. as
                                                                      • HPC customers can reap the benefits of TrueScale
a result, a measurable and sustainable difference in application
                                                                        adapters, which significantly outperform Mellanox DDR
performance can be realized when deploying the TrueScale IB
                                                                        adapters as the size of the cluster increases
architecture.

QLogic has performed a series of head-to-head performance
benchmarks showing the I/O performance and scalability
advantages of their 7200 Series of Dual Data rate (DDr) IB
WHITE PaPEr
      QLogic TrueScale™ DDR IB Adapter Provides Scalable, Best-In-Class Performance QLogic’s DDR Adapters Outperform Mellanox®



                                                                                        “scale-out” (large node count) clusters, the efficient message
      results
                                                                                        processing capabilities of the adapter enable more effective use of
      The most accurate way to establish the best interconnect option
                                                                                        the available compute resources, resulting in application performance
      for a given application is to install and run the application on a
                                                                                        benefits as the number of cores per node and the number of nodes
      variety of fabrics to determine the best performing option. However,
                                                                                        in a cluster increase.
      given the costs associated with this approach, the use of industry
      standard benchmarks is a more pragmatic means of evaluating an
                                                                                        Microbenchmarks
      interconnect.
                                                                                        Table 1 summarizes QLogic’s findings in scalable benchmark
      For applications with heavy messaging requirements, message rate
                                                                                        performance between ConnectX and TrueScale IB adapters.
      performance is a good indicator of how well an interconnect will
      be able to support the needs of an application. another factor to
                                                                                        Message rate
      consider is how well the interconnect maintains its performance as
                                                                                        As seen in Table 1, at eight processes per node (ppn),
      the system is scaled. The High Performance Computing Challenge
                                                                                        TrueScale message rate performance is over three times that
      (HPCC) scalable latency and scalable message rate benchmarks
                                                                                        of ConnectX.
      are strong indicators of how well the interconnect will support an
      application at scale.                                                             OSU’s Multiple Bandwidth/Message rate benchmark (osu_mbw_mr)
                                                                                        was run on two servers connected by 1m cable (no switch), each
      architecturally, ConnectX is designed to offload more of the burden of            server with 2x 3.0 GHz Intel® Harpertown E5472, quad-core CPUs,
      communication processing from the CPU to the adapter. This design                 16GB raM, rHEL 5. ConnectX runs used OFED 1.3, MVaPICH-1.0.0
      can provide benefits in CPU utilization, especially when using single-            (default options). TrueScale runs used InfiniPath® 2.2/OFED 1.3 and
                                                                                        QLogic MPI (default options).
      or dual-core compute nodes. However, given the availability of
      multiple cores in today’s compute nodes, this approach is no longer               as multi-core systems become increasingly more prevalent, the cluster
      optimal. as more cores are added to a node, the communications                    interconnect must be able to accommodate more processes per
      burden on a single adapter increases significantly. This results in               compute node. The TrueScale architecture was designed with this trend
      an increased dependency on the adapter’s capabilities for scalable                in mind, enabling users to take maximum advantage of all the cores in
      “system” performance. Consequently, scalability anomalies can                     their compute nodes. This is accomplished through high message rate
      begin to appear when the number of cores in a compute node                        and superior inter- and intra-node communication capabilities.
      increases to four or five.

      Primarily due to the offload capability of ConnectX, Mellanox’s
      adapters require significantly more power to operate — as much
      as 50 percent compared to TrueScale adapters. The additional
      wattage required to power the compute nodes is also reflected
      in the associated higher cooling costs to bring down the ambient
      temperature in the data center.

      TrueScale architecture is designed to support highly-scaled
      applications with high message rate and ultra-low scalable latency
      performance. In both “scale-up” (multi-core environments) and


                              Table 1. Summary of QLogic’s Message Rate and Scalable Latency Advantage Over Mellanox

           Comparison                      Benchmark                       Mellanox®                     QLogic                      QLogic advantage
                                                                        MHGH28 | MHGH29             QLE7240 | QLE7280
       Message rage             OSU Message rate @ 8 ppn                      4.5 | 5.5                    19 | 26               Over 340%
       (non-coalesced)                                                  Million messages/s           Million message/s
       Scalable Latency         HPCC random ring Latency                      4.4 | 8.9                    1.3 | 1.1             Up to 33%
                                @ 128 cores                                      µs                           µs


HSG-WP08014                                                                  IB0030901-00 a                                                               2
WHITE PaPEr
      QLogic TrueScale™ DDR IB Adapter Provides Scalable, Best-In-Class Performance QLogic’s DDR Adapters Outperform Mellanox®



      Figure 1 illustrates the ability of TrueScale to make effective use
      of multi-core nodes1. Note that ConnectX does not scale as the
      processes per node increase. With TrueScale, more application work
      is accomplished as the node size increases.




                                                                                        Figure 2. TrueScale Multi-core Advantage in Latency Performance

                                                                                        When measuring latency with a realistic 128-byte message size, the
                                                                                        latency performance of ConnectX drops off at about four to five cores
                                                                                        per node. Under the same conditions, TrueScale provides consistent
                                                                                        and predictable levels of performance.
         Figure 1. TrueScale Multi-core Advantage in Message Rate
                               Performance
      Scalable Latency                                                                  application Performance
      In terms of scalable latency performance, at 128 cores, QLogic’s
                                                                                        SPEC MPI2007
      MPI latency ranges from 13 percent to 33 percent of Mellanox’s
                                                                                        There are more sophisticated benchmarks, such as SPEC MPI2007,
      ConnectX.
                                                                                        which measure performance at a system level over a variety of
                                                                                        different applications. This benchmark suite includes 13 different
      all scalable latency results are from the HPC Challenge web site
                                                                                        codes and emphasizes areas of performance that are most relevant
      (http://icl.cs.utk.edu/hpcc/hpcc_results_all.cgi) and use the random
      ring Latency benchmark. ConnectX Gen1 results are from the 2008-                  to MPI applications running on large scale systems. The quantity
      05-15 submission by Intel using 128 cores of the Intel Endeavour                  and performance of the microprocessors, memory architecture,
      cluster with Xeon® E5462 CPUs (2.8 GHz); ConnectX Gen2 results                    interconnect, compiler, and shared file system are all evaluated.
      are from the 2008-05-09 submission by TU Dresden using 128
      cores of the SGI® altix® ICE 8200EX cluster with Xeon X5472 CPUs                  In august 2008, QLogic ran the SPECmpiM_base2007 benchmark on
      (3.0 GHz). QLogic QLE7240 results are from their 2008-08-05                       a TrueScale enabled cluster that yielded the best overall performance
      submission using 128 cores of the Darwin Cluster with Xeon 5160                   at 96 and 128 cores3. This result represents third-party validation
      CPUs (3.0 GHz); QLogic QLE7280 results are from their 2008-08-01
                                                                                        of the scalable performance capabilities of the architecture over a
      submission using 128 cores of the QLogic Benchmark Cluster with
                                                                                        variety of application types. This result compared favorably not only
      Xeon E5472 CPUs (3.0 GHz).
                                                                                        to other commodity x86-based compute clusters, but also against
                                                                                        platforms from large system vendors.
      Figure 2 shows that TrueScale adapters maintain consistent latency
      performance as more cores are added to a node.2 Consequently,
                                                                                        Halo Test
      more of the compute power can be used for application workload
                                                                                        The halo test from argonne National Laboratory’s mpptest benchmark
      rather than waiting for the adapter to process messages.
                                                                                        suite simulates communications patterns in layered ocean models.




      1 These are the results of the OSU multiple bandwidth message rate (osu_
      mbw_mr) test. The test used a 1-byte message size when run on two nodes,
      each with 2x 3.0 GHz Intel Xeon E5472 quad-core CPUs. The test used QLogic
      MPI 2.2 for QLE7280 adapters and MVaPICH-1.0.0 and OFED 1.3 on Gen2
      ConnectX DDr adapters.
      2 These are the results of the OSU Multiple-latency (osu_multi_lat) test of
      QLE7240 and Gen1 ConnectX HCas at 128 bytes message size when run on two          3 Details of the submission and results can be found at:
      nodes, each with 2x 2.33 GHz Intel Xeon E5410 quad-core CPUs.                     http://www.spec.org/mpi2007/results/res2008q3/


HSG-WP08014                                                                  IB0030901-00 a                                                               3
WHITE PaPEr
      QLogic TrueScale™ DDR IB Adapter Provides Scalable, Best-In-Class Performance QLogic’s DDR Adapters Outperform Mellanox®



      Unlike many of the point-to-point microbenchmarks that measure                     Summary and Conclusion
      peak bandwidth, this benchmark measures throughput performance
                                                                                         TrueScale is architecturally designed to take advantage of two
      over a variety of message sizes. as seen in Figure 3, TrueScale out-
                                                                                         significant trends in high performance computing clusters: the
      performs Mellanox across the entire range of message sizes.1
                                                                                         prevalence of multi-core processors in compute nodes and the
                                                                                         need to deploy increasingly larger clusters to tackle more complex
                                                                                         computational problems.

                                                                                         The benefits of the TrueScale architecture can be demonstrated in a
                                                                                         variety of industry standard benchmarks that measure the scalable
                                                                                         performance characteristics of the interconnect. More importantly,
                                                                                         the advantages can be realized through improved application
                                                                                         performance and a reduced time-to-solution at about half the power
                                                                                         of ConnectX.

      Figure 3. TrueScale Bandwidth Performance on Halo Benchmark

      application requirements vary in terms of message sizes and patterns,
      so performance over a variety of message sizes is a better predictor
      of performance than peak measurements. At four processes per
      node, TrueScale bandwidth performance is anywhere from
      120 to 70 percent better at 128 and 1024 byte message sizes,
      respectively.




      1 The benchmark is the Halo test from argonne National Laboratory’s mpptest.
      In particular, the 2D halo psendrecv test at 4 processes per node on 8 nodes of
      2 x 2.6 GHz aMD® Opteron™ 2218 CPUs, 8 GB DDr2-667 memory; NVIDIa®
      MCP55 PCIe chipset, for a total of 32 MPI ranks. QLogic MPI 2.2 used for
      TrueScale adapters and MVaPICH 0.9.9 for ConnectX.


HSG-WP08014                                                                     IB0030901-00 a                                                           4
WHITE PaPEr
      QLogic TrueScale™ DDR IB Adapter Provides Scalable, Best-In-Class Performance QLogic’s DDR Adapters Outperform Mellanox®




      Disclaimer
      reasonable efforts have been made to ensure the validity and accuracy of these performance tests. QLogic Corporation is not liable for any
      error in this published white paper or the results thereof. Variation in results may be a result of change in configuration or in the environment.
      QLogic specifically disclaims any warranty, expressed or implied, relating to the test results and their accuracy, analysis, completeness or
      quality.




                                                                                                                                                                                                                                            www.qlogic.com
                                                        Corporate Headquarters                 QLogic Corporation              26650 aliso Viejo Parkway                 aliso Viejo, Ca 92656              949.389.6000

                                                        Europe Headquarters                QLogic (UK) LTD.               Quatro House             Lyon Way, Frimley               Camberley Surrey, GU16 7Er UK                       +44 (0) 1276 804 670

      © 2008 QLogic Corporation. Specifications are subject to change without notice. all rights reserved worldwide. QLogic, the QLogic logo, InfiniPath, and TrueScale are trademarks or registered trademarks of QLogic Corporation. Mellanox and ConnectX are trademarks
      or registered trademarks of Mellanox Technologies, Inc. InfiniBand is a trademark and service mark of the InfiniBand Trade association. Intel and Xeon are registered trademarks of Intel Corporation. SGI and altix are registered trademarks of Silicon Graphics,
      Inc., in the United States and/or other countries worldwide. aMD and Opteron are trademarks or registered trademarks of advanced Micro Devices, Inc. NVIDIa is a registered trademarks of NVIDIa Corporation.in the United States and other countries. all other
      brand and product names are trademarks or registered trademarks of their respective owners. Information supplied by QLogic Corporation is believed to be accurate and reliable. QLogic Corporation assumes no responsibility for any errors in this brochure. QLogic
      Corporation reserves the right, without notice, to make changes in product design or specifications.




HSG-WP08014                                                                                                                IB0030901-00 a                                                                                                                             5

Weitere ähnliche Inhalte

Mehr von seiland

Ibm F Co E Blade Center Battle Card
Ibm F Co E Blade Center Battle CardIbm F Co E Blade Center Battle Card
Ibm F Co E Blade Center Battle Cardseiland
 
Ibm F Co E Blade Center Battle Card
Ibm F Co E Blade Center Battle CardIbm F Co E Blade Center Battle Card
Ibm F Co E Blade Center Battle Cardseiland
 
8 Reasons To Choose True Scale
8 Reasons To Choose True Scale8 Reasons To Choose True Scale
8 Reasons To Choose True Scaleseiland
 
Understanding Low And Scalable Mpi Latency
Understanding Low And Scalable Mpi LatencyUnderstanding Low And Scalable Mpi Latency
Understanding Low And Scalable Mpi Latencyseiland
 
8 Reasons To Choose True Scale
8 Reasons To Choose True Scale8 Reasons To Choose True Scale
8 Reasons To Choose True Scaleseiland
 
9000 InfiniBand Datasheet
9000 InfiniBand Datasheet9000 InfiniBand Datasheet
9000 InfiniBand Datasheetseiland
 

Mehr von seiland (6)

Ibm F Co E Blade Center Battle Card
Ibm F Co E Blade Center Battle CardIbm F Co E Blade Center Battle Card
Ibm F Co E Blade Center Battle Card
 
Ibm F Co E Blade Center Battle Card
Ibm F Co E Blade Center Battle CardIbm F Co E Blade Center Battle Card
Ibm F Co E Blade Center Battle Card
 
8 Reasons To Choose True Scale
8 Reasons To Choose True Scale8 Reasons To Choose True Scale
8 Reasons To Choose True Scale
 
Understanding Low And Scalable Mpi Latency
Understanding Low And Scalable Mpi LatencyUnderstanding Low And Scalable Mpi Latency
Understanding Low And Scalable Mpi Latency
 
8 Reasons To Choose True Scale
8 Reasons To Choose True Scale8 Reasons To Choose True Scale
8 Reasons To Choose True Scale
 
9000 InfiniBand Datasheet
9000 InfiniBand Datasheet9000 InfiniBand Datasheet
9000 InfiniBand Datasheet
 

Kürzlich hochgeladen

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

True Scale Ddr Best In Class Performance

  • 1. WHITE PaPEr QLogic TrueScale™ DDR IB Adapter Provides Scalable, Best-In-Class Performance QLogic’s DDR Adapters QLogic’s Message Rate 340% Better and Scalable Latency Up to 33% Superior Outperform Mellanox ® adapters over Mellanox ConnectX™ adapters. The findings in Executive Summary this paper demonstrate that QLogic TrueScale adapters are the Solving today’s most challenging computational problems requires best choice for High Performance Computing (HPC) applications. more powerful, cost-effective, and power efficient systems. as clusters and the number of processors per cluster grow to address Key Findings problems of increasing complexity, the communication needs of the applications also increase. Consequently, interconnect The QLogic 7200 Series DDr InfiniBand adapters offer better performance is crucial for application scaling. Satisfying the high message and scalable latency than Mellanox’s ConnectX adapters. performance requirements of Inter-Processor Communications The test results described in this paper suggest that: (IPC) requires a interconnect that: • Message rate performance is over 340-percent better • Efficiently processes a variety of messages patterns than ConnectX • Leverages the benefits of multi-core processors • Scales with the size of the fabric • Scalable latency is up to 33 percent superior to ConnectX • Minimizes power requirements • TrueScale bandwidth performance is anywhere from 120 QLogic Host Channel adapters (HCas) have been architected with to 70 percent better at 128- and 1024-byte message sizes, these design goals in mind to provide significantly better scaling respectively performance than any other InfiniBand™ (IB) architecture. as • HPC customers can reap the benefits of TrueScale a result, a measurable and sustainable difference in application adapters, which significantly outperform Mellanox DDR performance can be realized when deploying the TrueScale IB adapters as the size of the cluster increases architecture. QLogic has performed a series of head-to-head performance benchmarks showing the I/O performance and scalability advantages of their 7200 Series of Dual Data rate (DDr) IB
  • 2. WHITE PaPEr QLogic TrueScale™ DDR IB Adapter Provides Scalable, Best-In-Class Performance QLogic’s DDR Adapters Outperform Mellanox® “scale-out” (large node count) clusters, the efficient message results processing capabilities of the adapter enable more effective use of The most accurate way to establish the best interconnect option the available compute resources, resulting in application performance for a given application is to install and run the application on a benefits as the number of cores per node and the number of nodes variety of fabrics to determine the best performing option. However, in a cluster increase. given the costs associated with this approach, the use of industry standard benchmarks is a more pragmatic means of evaluating an Microbenchmarks interconnect. Table 1 summarizes QLogic’s findings in scalable benchmark For applications with heavy messaging requirements, message rate performance between ConnectX and TrueScale IB adapters. performance is a good indicator of how well an interconnect will be able to support the needs of an application. another factor to Message rate consider is how well the interconnect maintains its performance as As seen in Table 1, at eight processes per node (ppn), the system is scaled. The High Performance Computing Challenge TrueScale message rate performance is over three times that (HPCC) scalable latency and scalable message rate benchmarks of ConnectX. are strong indicators of how well the interconnect will support an application at scale. OSU’s Multiple Bandwidth/Message rate benchmark (osu_mbw_mr) was run on two servers connected by 1m cable (no switch), each architecturally, ConnectX is designed to offload more of the burden of server with 2x 3.0 GHz Intel® Harpertown E5472, quad-core CPUs, communication processing from the CPU to the adapter. This design 16GB raM, rHEL 5. ConnectX runs used OFED 1.3, MVaPICH-1.0.0 can provide benefits in CPU utilization, especially when using single- (default options). TrueScale runs used InfiniPath® 2.2/OFED 1.3 and QLogic MPI (default options). or dual-core compute nodes. However, given the availability of multiple cores in today’s compute nodes, this approach is no longer as multi-core systems become increasingly more prevalent, the cluster optimal. as more cores are added to a node, the communications interconnect must be able to accommodate more processes per burden on a single adapter increases significantly. This results in compute node. The TrueScale architecture was designed with this trend an increased dependency on the adapter’s capabilities for scalable in mind, enabling users to take maximum advantage of all the cores in “system” performance. Consequently, scalability anomalies can their compute nodes. This is accomplished through high message rate begin to appear when the number of cores in a compute node and superior inter- and intra-node communication capabilities. increases to four or five. Primarily due to the offload capability of ConnectX, Mellanox’s adapters require significantly more power to operate — as much as 50 percent compared to TrueScale adapters. The additional wattage required to power the compute nodes is also reflected in the associated higher cooling costs to bring down the ambient temperature in the data center. TrueScale architecture is designed to support highly-scaled applications with high message rate and ultra-low scalable latency performance. In both “scale-up” (multi-core environments) and Table 1. Summary of QLogic’s Message Rate and Scalable Latency Advantage Over Mellanox Comparison Benchmark Mellanox® QLogic QLogic advantage MHGH28 | MHGH29 QLE7240 | QLE7280 Message rage OSU Message rate @ 8 ppn 4.5 | 5.5 19 | 26 Over 340% (non-coalesced) Million messages/s Million message/s Scalable Latency HPCC random ring Latency 4.4 | 8.9 1.3 | 1.1 Up to 33% @ 128 cores µs µs HSG-WP08014 IB0030901-00 a 2
  • 3. WHITE PaPEr QLogic TrueScale™ DDR IB Adapter Provides Scalable, Best-In-Class Performance QLogic’s DDR Adapters Outperform Mellanox® Figure 1 illustrates the ability of TrueScale to make effective use of multi-core nodes1. Note that ConnectX does not scale as the processes per node increase. With TrueScale, more application work is accomplished as the node size increases. Figure 2. TrueScale Multi-core Advantage in Latency Performance When measuring latency with a realistic 128-byte message size, the latency performance of ConnectX drops off at about four to five cores per node. Under the same conditions, TrueScale provides consistent and predictable levels of performance. Figure 1. TrueScale Multi-core Advantage in Message Rate Performance Scalable Latency application Performance In terms of scalable latency performance, at 128 cores, QLogic’s SPEC MPI2007 MPI latency ranges from 13 percent to 33 percent of Mellanox’s There are more sophisticated benchmarks, such as SPEC MPI2007, ConnectX. which measure performance at a system level over a variety of different applications. This benchmark suite includes 13 different all scalable latency results are from the HPC Challenge web site codes and emphasizes areas of performance that are most relevant (http://icl.cs.utk.edu/hpcc/hpcc_results_all.cgi) and use the random ring Latency benchmark. ConnectX Gen1 results are from the 2008- to MPI applications running on large scale systems. The quantity 05-15 submission by Intel using 128 cores of the Intel Endeavour and performance of the microprocessors, memory architecture, cluster with Xeon® E5462 CPUs (2.8 GHz); ConnectX Gen2 results interconnect, compiler, and shared file system are all evaluated. are from the 2008-05-09 submission by TU Dresden using 128 cores of the SGI® altix® ICE 8200EX cluster with Xeon X5472 CPUs In august 2008, QLogic ran the SPECmpiM_base2007 benchmark on (3.0 GHz). QLogic QLE7240 results are from their 2008-08-05 a TrueScale enabled cluster that yielded the best overall performance submission using 128 cores of the Darwin Cluster with Xeon 5160 at 96 and 128 cores3. This result represents third-party validation CPUs (3.0 GHz); QLogic QLE7280 results are from their 2008-08-01 of the scalable performance capabilities of the architecture over a submission using 128 cores of the QLogic Benchmark Cluster with variety of application types. This result compared favorably not only Xeon E5472 CPUs (3.0 GHz). to other commodity x86-based compute clusters, but also against platforms from large system vendors. Figure 2 shows that TrueScale adapters maintain consistent latency performance as more cores are added to a node.2 Consequently, Halo Test more of the compute power can be used for application workload The halo test from argonne National Laboratory’s mpptest benchmark rather than waiting for the adapter to process messages. suite simulates communications patterns in layered ocean models. 1 These are the results of the OSU multiple bandwidth message rate (osu_ mbw_mr) test. The test used a 1-byte message size when run on two nodes, each with 2x 3.0 GHz Intel Xeon E5472 quad-core CPUs. The test used QLogic MPI 2.2 for QLE7280 adapters and MVaPICH-1.0.0 and OFED 1.3 on Gen2 ConnectX DDr adapters. 2 These are the results of the OSU Multiple-latency (osu_multi_lat) test of QLE7240 and Gen1 ConnectX HCas at 128 bytes message size when run on two 3 Details of the submission and results can be found at: nodes, each with 2x 2.33 GHz Intel Xeon E5410 quad-core CPUs. http://www.spec.org/mpi2007/results/res2008q3/ HSG-WP08014 IB0030901-00 a 3
  • 4. WHITE PaPEr QLogic TrueScale™ DDR IB Adapter Provides Scalable, Best-In-Class Performance QLogic’s DDR Adapters Outperform Mellanox® Unlike many of the point-to-point microbenchmarks that measure Summary and Conclusion peak bandwidth, this benchmark measures throughput performance TrueScale is architecturally designed to take advantage of two over a variety of message sizes. as seen in Figure 3, TrueScale out- significant trends in high performance computing clusters: the performs Mellanox across the entire range of message sizes.1 prevalence of multi-core processors in compute nodes and the need to deploy increasingly larger clusters to tackle more complex computational problems. The benefits of the TrueScale architecture can be demonstrated in a variety of industry standard benchmarks that measure the scalable performance characteristics of the interconnect. More importantly, the advantages can be realized through improved application performance and a reduced time-to-solution at about half the power of ConnectX. Figure 3. TrueScale Bandwidth Performance on Halo Benchmark application requirements vary in terms of message sizes and patterns, so performance over a variety of message sizes is a better predictor of performance than peak measurements. At four processes per node, TrueScale bandwidth performance is anywhere from 120 to 70 percent better at 128 and 1024 byte message sizes, respectively. 1 The benchmark is the Halo test from argonne National Laboratory’s mpptest. In particular, the 2D halo psendrecv test at 4 processes per node on 8 nodes of 2 x 2.6 GHz aMD® Opteron™ 2218 CPUs, 8 GB DDr2-667 memory; NVIDIa® MCP55 PCIe chipset, for a total of 32 MPI ranks. QLogic MPI 2.2 used for TrueScale adapters and MVaPICH 0.9.9 for ConnectX. HSG-WP08014 IB0030901-00 a 4
  • 5. WHITE PaPEr QLogic TrueScale™ DDR IB Adapter Provides Scalable, Best-In-Class Performance QLogic’s DDR Adapters Outperform Mellanox® Disclaimer reasonable efforts have been made to ensure the validity and accuracy of these performance tests. QLogic Corporation is not liable for any error in this published white paper or the results thereof. Variation in results may be a result of change in configuration or in the environment. QLogic specifically disclaims any warranty, expressed or implied, relating to the test results and their accuracy, analysis, completeness or quality. www.qlogic.com Corporate Headquarters QLogic Corporation 26650 aliso Viejo Parkway aliso Viejo, Ca 92656 949.389.6000 Europe Headquarters QLogic (UK) LTD. Quatro House Lyon Way, Frimley Camberley Surrey, GU16 7Er UK +44 (0) 1276 804 670 © 2008 QLogic Corporation. Specifications are subject to change without notice. all rights reserved worldwide. QLogic, the QLogic logo, InfiniPath, and TrueScale are trademarks or registered trademarks of QLogic Corporation. Mellanox and ConnectX are trademarks or registered trademarks of Mellanox Technologies, Inc. InfiniBand is a trademark and service mark of the InfiniBand Trade association. Intel and Xeon are registered trademarks of Intel Corporation. SGI and altix are registered trademarks of Silicon Graphics, Inc., in the United States and/or other countries worldwide. aMD and Opteron are trademarks or registered trademarks of advanced Micro Devices, Inc. NVIDIa is a registered trademarks of NVIDIa Corporation.in the United States and other countries. all other brand and product names are trademarks or registered trademarks of their respective owners. Information supplied by QLogic Corporation is believed to be accurate and reliable. QLogic Corporation assumes no responsibility for any errors in this brochure. QLogic Corporation reserves the right, without notice, to make changes in product design or specifications. HSG-WP08014 IB0030901-00 a 5