SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Optimizing Hadoop*
                                                                                             Workloads
                                                                                              Nurcan Coskun
                                                                              Intel Software & Solutions Group
                                                                                               October 2, 2009


                          Acknowledgements to Jason Dai, Intel SSG, for many
                             of the test results and optimization techniques
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may
be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice. Copyright © 2009, Intel Corporation.
Legal Disclaimers
        Disclaimers & Legal Notices
        THE INFORMATION IS FURNISHED FOR INFORMATIONAL USE ONLY, IS SUBJECT TO CHANGE WITHOUT NOTICE, AND SHOULD
        NOT BE CONSTRUED AS A COMMITMENT BY INTEL CORPORATION. INTEL CORPORATION ASSUMES NO RESPONSIBILITY OR
        LIABILITY FOR ANY ERRORS OR INACCURACIES THAT MAY APPEAR IN THIS DOCUMENT OR ANY SOFTWARE THAT MAY BE
        PROVIDED IN ASSOCIATION WITH THIS DOCUMENT. THIS INFORMATION IS PROVIDED "AS IS" AND INTEL DISCLAIMS ANY
        EXPRESS OR IMPLIED WARRANTY, RELATING TO THE USE OF THIS INFORMATION INCLUDING WARRANTIES RELATING TO
        FITNESS FOR A PARTICULAR PURPOSE, COMPLIANCE WITH A SPECIFICATION OR STANDARD, MERCHANTABILITY OR
        NONINFRINGEMENT.
        Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate
        performance of Intel products as measured by those tests. Any difference in system hardware or software design or
        configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance
        of systems or components they are considering purchasing. For more information on performance tests and on the
        performance of Intel products, visit Intel Performance Benchmark Limitations
        INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR
        IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT
        AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY
        WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL
        PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY,
        OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED
        IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE
        FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.
        Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the
        absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future
        definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The
        information here is subject to change without notice. Do not finalize a design with this information. The products described in
        this document may contain design defects or errors known as errata which may cause the product to deviate from published
        specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to
        obtain the latest specifications and before placing your product order. Copies of documents which have an order number and
        are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's
        Web Site http://www.intel.com/.




   2



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Why Optimize Hadoop Deployments?




       Handle                                                              At                                               In                                     With
        More                                                             Lower                                             Less                                    Less
        Data                                                              Cost                                             Time                                   Power


   3



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Where to Optimize?


                                  Hardware                                                                                       Software




   4



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Hadoop Servers


        Masters: JobTracker, NameNode, Secondary NameNode
              – Deploy additional RAM and secondary power supplies
              – Ensure highest performance and reliability

        Slaves: DataNodes, TaskTrackers
              – Hadoop Framework handles slave failures well
                        – Data blocks are replicated and distributed
              – Workload may be bound by I/O, memory or processor resources
                        – The system level hardware should be adjusted on a
                          case-by-case basis




   5



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Server Platform

        •Dual-socket servers are optimal for Hadoop deployments
        •Dual-socket servers are more efficient than large-scale multi-
        processor platforms from a per-node, cost benefit perspective
        •Dual-socket servers offset the added per-node hardware cost
        relative to entry-level servers through superior efficiencies in
        terms of load-balancing and parallelization overheads
        •Choosing hardware based on the most current platform
        technologies available helps to ensure the optimal intra-server
        throughput and efficiency




   6



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Processor Choice Matters




                                                                                                                                 Faster

                                                                                                         Handles More Data

                                                                                                     More Energy Efficient




   7



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Processor Choice Impacts Speed




    Data Source: Intel internal measurements by using Hadoop 0.19.1 as of September 20, 2009.
    Hardware configurations are on slide 22. Performance tests and ratings are measured using specific computer systems and/or components and reflect
    the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration
    may affect actual performance.


   8



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Processor Choice Impacts Throughput




       • Throughput = # of tasks completed / minute when cluster is at 100% utilization.
       • Intel Xeon processor 5500 provides up to 86% more throughput than 5400 series.

    Data Source: Intel internal measurements by using Hadoop 0.19.1 as of September 20, 2009.
    Hardware configurations are on slide 22. Performance tests and ratings are measured using specific computer systems and/or components and reflect
    the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration
    may affect actual performance.


   9



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Processor Scaling
                                                                Inte l® X e on® P roce ssor 5400 Se rie s (H arpe rtown) C luste r                                                            Inte l® X e on® P roce ssor 5500 S e rie s (N e hale m) C luste r
                                                                                   (Lowe r Value s are B e tte r)                                                                                              (Lowe r Value s are B e tte r)

                                                             30000                                                                                                                    20000
                                                                                                                                     1G B
                                                                                                                                                                                                                                                                  1G B
                                                                                                                                     2G B                                             18000
                                                                                                                                                                                                                                                                  2G B
                                                             25000                                                                   3G B
                                                                                                                                                                                      16000                                                                       3G B
                                                                                                                                     4G B




                                                                                                                                              JavaS ort Tom pletion Tim e (seconds)
                                                                                                                                                                                                                                                                  4G B
            J a v a S ort Tom ple tion Tim e (s e c onds )




                                                                                                                                     5G B                                             14000
                                                             20000                                                                                                                                                                                                5G B
                                                                                                                                     6G B
                                                                                                                                                                                                                                                                  6G B
                                                                                                                                     7G B                                             12000
                                                                                                                                                                                                                                                                  7G B
                                                             15000                                                                   8G B
                                                                                                                                                                                      10000                                                                       8G B
                                                                                                                                     9G B
                                                                                                                                                                                                                                                                  9G B
                                                                                                                                     10G B                                             8000
                                                                                                                                                                                                                                                                  10G B
                                                             10000                                                                   50G B
                                                                                                                                                                                       6000                                                                       50G B
                                                                                                                                     100G B
                                                                                                                                                                                                                                                                  100G B
                                                                                                                                     150G B                                            4000
                                                              5000                                                                                                                                                                                                150G B
                                                                                                                                     200G B
                                                                                                                                                                                                                                                                  200G B
                                                                                                                                                                                       2000
                                                                                                                                                                                                                                                                  250G B
                                                                 0                                                                                                                        0
                                                                      1        2       3       4        5       6       7                                                                         1        2        3       4        5        6       7
                                                                                       Num be r of Node s                                                                                                           Num be r of Node s



     •Hadoop workloads scales well on Intel processors
     •Intel® Xeon® processor 5500 can handle larger data sizes than 5400 series.

    Data Source: Intel internal measurements by using Hadoop 0.19.0 as of September 20, 2009.
    Hardware configurations are on slide 21. Performance tests and ratings are measured using specific computer systems and/or components and reflect
    the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration
    may affect actual performance.


   10



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Turn on Intel® Hyper-threading Technology
                 Intel® Hyper-threading                                                                                                         Intel® Xeon® Processor 5500 Series (Nehalem)
                       Technology                                                                                                                        SMT effect in 8 node cluster
                                                                                                                                                          (Lower Values Are Better)

                                                                                                                                    250




                                                                                               JavaSort Completion Time (seconds)
   Increases performance for threaded
 applications delivering greater throughput                                                                                         200
            and responsiveness
                                                                                                                                    150
                                                                                                                                                                                                           SMT ON
                                                                                                                                                                                                           SMT OFF
                                                                                                                                    100


                                                                                                                                     50



                                                                                                                                     0
                                                                                                                                          1GB   2GB   3GB   4GB    5GB    6GB     7GB   8GB   9GB   10GB
                                                                                                                                                                  Data Set Size




                                                                                                                                     Up to 25% better performance
    Data Source: Intel internal measurements by using Hadoop 0.19.0 as of September 20, 2009.
    Hardware configurations are on slide 21. Performance tests and ratings are measured using specific computer systems and/or components and reflect
    the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration
    may affect actual performance.


   11



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Memory


        •Sufficient memory capacity is critical for efficient operation of
        servers in a Hadoop cluster, supporting high throughput by
        allowing large number of map/reduce tasks to be carried out
        simultaneously
        •Typical Hadoop applications require approximately 1-2 GB of
        RAM per processor core, which corresponds to 8-16GB for a
        dual-socket server using quad-core processors
        •Error Correcting Code (ECC) memory is highly recommended
        to detect and correct errors introduced during storage and
        transmission of data




   12



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Selecting Server Motherboard


        •Select server motherboards which are optimized for high
        density computing environments.
              – They should use high efficiency voltage regulators
              – They need to be optimized for airflow
              – They should use certified power supplies

        •Optimized server motherboards will use less power, need less
        cooling, and save money




   13



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Hard Disk and SSD


        •Large number of hard drives per server (4-6)
        •Hadoop orchestrates data provisioning and redundancy across
        individual nodes (Using RAID 0 is not needed)
        •SSD’s are faster and they require very little power, SSD usage
        will also eliminate cooling cost created by hard disk drives
        •Use SSD’s:
              – To store mission critical smaller data sets
              – To store map/reduce intermediate results
              – To replace HDD’s with SDD’s to reduce power consumption,
                increase throughput and improve performance




   14



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Use Intel® X25-E SATA SSD’s
                                                          10 N ode Inte l® X e on® L5520 (N e hale m) C luste r
                                                                     (Lowe r Value s are B e tte r)

                                            2500


                                            2000
              JavaS ort Com pletion Tim e




                                            1500
                         (seconds)




                                                                                                                                                              hdd
                                                                                                                                                              ssd
                                            1000



                                             500



                                               0
                                                   1G B            10G B               50G B                    80G B                  100G B
                                                                                 Da ta S e t S iz e



    Data Source: Intel internal measurements by using Hadoop 0.19.0 as of September 20, 2009.
    Hardware configurations are on slide 23. Performance tests and ratings are measured using specific computer systems and/or components and reflect
    the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration
    may affect actual performance.


   15



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
System Software


        •Use a Linux* distribution based on kernel version 2.6.30 or
        later because of the optimizations included for energy and
        threading efficiency
          – For Example: energy consumption can be up to 60 percent
             (42 watts) higher at idle for each server using older
             versions of Linux
        •Optimize Linux* file system configurations
              – Noatime attribute
              – Open file descriptor limit

        •Use latest Java (for example Sun Java* 6u14)
              – Use 64 bit optimized JVM builds



   16



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Hadoop Configuration Tuning

        •The number of NameNode and JobTracker threads(10 -> 64)
        •The number of DataNode server threads (3 -> 8)
        •The number of work threads on HTTP server that runs on each TaskTracker
        (40-50)
        •HDFS replication factor (3)
        •Default HDFS block size (64MB -> 128MB)
        •Maximum number of map/reduce tasks per node
              –      (cores_per_node)/2 -> 2*(cores_per_node)
        •     The number of input streams (files) to be merged at once in map/reduce
              tasks (example: 100)
        •     JVM settings
        •     The total size of result and metadata buffers associates with a map task
              (100MB -> 200 MB)


   17



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
System-stack Example


        Two-way Intel® Xeon® processor 5500 series
        Intel® X25-E SATA SSD’s
        Four to six 7200 RPM SATA drives
        12-24 GB DDR3 ECC RAM
        Intel® Server Board S5500WB
        80 PLUS* Gold Certified power supplies
        Linux* based on kernel 2.6.30 or later
        Sun Java* 6u14 or later
        Hadoop* (0.18.3 or 0.20.0)




   18



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Summary


        Hardware selection:
        • Intel® Xeon® 5500 (“Nehalem”) improves Hadoop Workload
        performance
        • Choosing an optimized server board such as Intel® SB5500WB
        (“WillowBrook”) can reduce power consumption
        • Use Intel® X25-E SATA SSD’s to improve performance
        Software & configurations:
        • Use latest Linux kernel
        • Turn on Intel® Hyper-threading
        • Optimize Hadoop Configuration
        • Tuning may be different for different workload types

   19



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
References:


        1. http://www.intel.com/p/en_US/products/server/processor
        2. http://www.intel.com/it/pdf/server-rightsizing.pdf
        3. http://www.80plus.org/
        4. https://opencirrus.org/content/agenda-open-cirrus-summit-palo-
           alto-june-8-9-2009




   20



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Cluster Configurations Information
        (Slides: “Processor Scaling” and “Turn on Intel® Hyper-
        threading”)
                                      Hardware Configuration
          Item                                                                                              Endeavor                                     Atlantis
          Node count                                                                                        1-10 nodes                                   1-10 nodes
          Platform                                                                                          Intel SR1600UR                               Intel SR1560SF system
                                                                                                               Intel S5520UR main board                     Intel S5400SF main board
                                                                                                               1U chassis                                   1U chassis


          CPU/Stepping                                                                                      Intel® Xeon® X5560 C1 step                   Intel® Xeon® X5482; C0 step
                                                                                                            (Nehalem EP)                                 (Harpertown)
                                                                                                             2.8GHz / 6.4 QPI 1333 95 W                     3.2 GHz / 12 MB L2 cache
                                                                                                            1MB L2 cache, 8M L3 cache

          RAM                                                                                               24 GB total/node                             16 GB
                                                                                                            6*4GB 1333MHz Reg ECC DDR3                   (FBDIMM 8x2-GB 667MHz)


          Chipset                                                                                           Tylersburg                                   Seaburg
          BIOS Version                                                                                      Rev 26                                       Rev 22.1
                                                                                                            08 Apr 2008                                  7 Nov 2007
          Interconnects                                                                                     Gigabit Ethernet                             Gigabit Ethernet
                                                                                                            QDR InfiniBand                               DDR InfiniBand
          Hard drive specs                                                                                  Seagate Cheetah NS                           Seagate Barracuda ES
                                                                                                              400 GB SAS HDD 10kRPM                        250 GB SATA HDD
                                                                                                              Model: ST3400755SS                           Model: ST3250620NS
                                                                                                            Using onboard Intel Entry Level
                                                                                                            Raid controller




   21



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Cluster Configurations Information
        (Slides: “Processor Choice Impacts Speed” and
        “Processor Choice Impacts Throughput”)
        Intel® Xeon® X5460-based server
        Processor: Dual-socket quad-core Intel® Xeon® X5460 3.16GHz
        Processor Memory: 16GB (DDR2 FBDIM ECC 667MHz) RAM
        Storage: 1 X 300GB 15K RPM SAS disk for system and log files, 4 X 1TB 7200RPM SATA for HDFS and
        intermediate results
        Network: 1 Gigabit Ethernet NIC
        BIOS: BIOS version S5000.86B.10.60.0091.100920081631EIST (Enhanced Intel SpeedStep Technology)
        disabled both hardware prefetcher and adjacent cache-line, prefetch disable


        Intel® Xeon® X5570-based server
        Processor: Dual-socket quad-core Intel® Xeon® X5570 2.93GHz
        Processor Memory: 16GB (DDR3 ECC 1333MHz) RAM
        Storage: 1 X 1TB 7200RPM SATA for system and log files, 4 X 1TB 7200RPM SATA for HDFS and
        intermediate results
        Network: 1 Gigabit Ethernet NIC
        BIOS: BIOS version 4.6.3 Both EIST (Enhanced Intel SpeedStep Technology) and Turbo mode disabled
        both hardware prefetcher and adjacent cache-line prefetch enabled, SMT (Simultaneous MultiThreading),
        enabled (Disabling hardware prefetcher and adjacent cache-line prefetch helps improve Hadoop
        performance on Xeon X5460 server according to our benchmarking.)




   22



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
Cluster Configurations Information
        (Slides: “Use Intel® X25-E SATA SSD’s”)
        Slaves:
        •    Intel® Xeon® L5520 Processor (Nehalem) @ 2.27 GHz CPUs 5.8 GB/sec QPI, 24 GBy RAM
        •    Server Board: Intel® SB5500WB (Willowbrook)
        •    1x 1 TB SATA HDD boot disk, holds ${HOME} dirs: /
        •    2x 1 TB SATA HDD scratch/experiment disks:
        •    2x 64 GB Intel® X25-E SATA SLC SSD scratch/experiment disks
        •OS: Ubuntu* 9.04 == 2.6.28-4 kernel (to enable power saving with preserved performance)


        Master:
        •Intel® Xeon® Processor 2.93 GHz CPUs, 6.4 GB/sec QPI, 16 GBy RAM
        •Server Board: Intel® SB5500WB (Willowbrook)
        •Hard Disks:
        • 1x 500 GB SATA OS boot disk (/dev/sda1), holds installed software
               and ${HOME} dirs
        • 2x 500 GB SATA scratch disks
        • 2x64 GB Intel® X25-E SATA SLC SSDs
        •OS: RedHat* Enterprise Linux 5.3 Server x64t



   23



Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.

Weitere ähnliche Inhalte

Was ist angesagt?

IT@Intel: Creating Smart Spaces with All-in-Ones
IT@Intel:  Creating Smart Spaces with All-in-OnesIT@Intel:  Creating Smart Spaces with All-in-Ones
IT@Intel: Creating Smart Spaces with All-in-OnesIT@Intel
 
Unlock Hidden Potential through Big Data and Analytics
Unlock Hidden Potential through Big Data and AnalyticsUnlock Hidden Potential through Big Data and Analytics
Unlock Hidden Potential through Big Data and AnalyticsIT@Intel
 
Новые технологии Intel в центрах обработки данных
Новые технологии Intel в центрах обработки данныхНовые технологии Intel в центрах обработки данных
Новые технологии Intel в центрах обработки данныхCisco Russia
 
Relative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph TempleRelative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph TempleJoao Galdino Mello de Souza
 
Microsoft Build 2019- Intel AI Workshop
Microsoft Build 2019- Intel AI Workshop Microsoft Build 2019- Intel AI Workshop
Microsoft Build 2019- Intel AI Workshop Intel® Software
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAIntel® Software
 
Driving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to ExascaleDriving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to ExascaleIntel IT Center
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldIntel IT Center
 
Crooke CWF Keynote FINAL final platinum
Crooke CWF Keynote FINAL final platinumCrooke CWF Keynote FINAL final platinum
Crooke CWF Keynote FINAL final platinumAlan Frost
 
How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC Gael Hofemeier
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIntel® Software
 
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upIntel® Software
 
TDC2019 Intel Software Day - Inferencia de IA em edge devices
TDC2019 Intel Software Day - Inferencia de IA em edge devicesTDC2019 Intel Software Day - Inferencia de IA em edge devices
TDC2019 Intel Software Day - Inferencia de IA em edge devicestdc-globalcode
 
Cmpc product update(cp) feb '09-edited
Cmpc product update(cp)   feb '09-editedCmpc product update(cp)   feb '09-edited
Cmpc product update(cp) feb '09-editedRene Torres Visso
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, fasterIntel® Software
 
Intel® Xeon® Processor E5-2600 v4 Product Family EAMG
Intel® Xeon® Processor E5-2600 v4 Product Family EAMGIntel® Xeon® Processor E5-2600 v4 Product Family EAMG
Intel® Xeon® Processor E5-2600 v4 Product Family EAMGIntel IT Center
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYehMAKERPRO.cc
 
ADE CV_Ade Oyewole
ADE CV_Ade OyewoleADE CV_Ade Oyewole
ADE CV_Ade OyewoleAde Oyewole
 
It tools slideshare
It tools slideshareIt tools slideshare
It tools slideshareIT@Intel
 

Was ist angesagt? (19)

IT@Intel: Creating Smart Spaces with All-in-Ones
IT@Intel:  Creating Smart Spaces with All-in-OnesIT@Intel:  Creating Smart Spaces with All-in-Ones
IT@Intel: Creating Smart Spaces with All-in-Ones
 
Unlock Hidden Potential through Big Data and Analytics
Unlock Hidden Potential through Big Data and AnalyticsUnlock Hidden Potential through Big Data and Analytics
Unlock Hidden Potential through Big Data and Analytics
 
Новые технологии Intel в центрах обработки данных
Новые технологии Intel в центрах обработки данныхНовые технологии Intel в центрах обработки данных
Новые технологии Intel в центрах обработки данных
 
Relative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph TempleRelative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph Temple
 
Microsoft Build 2019- Intel AI Workshop
Microsoft Build 2019- Intel AI Workshop Microsoft Build 2019- Intel AI Workshop
Microsoft Build 2019- Intel AI Workshop
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
 
Driving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to ExascaleDriving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to Exascale
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital World
 
Crooke CWF Keynote FINAL final platinum
Crooke CWF Keynote FINAL final platinumCrooke CWF Keynote FINAL final platinum
Crooke CWF Keynote FINAL final platinum
 
How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for Intel
 
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff up
 
TDC2019 Intel Software Day - Inferencia de IA em edge devices
TDC2019 Intel Software Day - Inferencia de IA em edge devicesTDC2019 Intel Software Day - Inferencia de IA em edge devices
TDC2019 Intel Software Day - Inferencia de IA em edge devices
 
Cmpc product update(cp) feb '09-edited
Cmpc product update(cp)   feb '09-editedCmpc product update(cp)   feb '09-edited
Cmpc product update(cp) feb '09-edited
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, faster
 
Intel® Xeon® Processor E5-2600 v4 Product Family EAMG
Intel® Xeon® Processor E5-2600 v4 Product Family EAMGIntel® Xeon® Processor E5-2600 v4 Product Family EAMG
Intel® Xeon® Processor E5-2600 v4 Product Family EAMG
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
 
ADE CV_Ade Oyewole
ADE CV_Ade OyewoleADE CV_Ade Oyewole
ADE CV_Ade Oyewole
 
It tools slideshare
It tools slideshareIt tools slideshare
It tools slideshare
 

Andere mochten auch

Hw09 Fingerpointing Sourcing Performance Issues
Hw09   Fingerpointing  Sourcing Performance IssuesHw09   Fingerpointing  Sourcing Performance Issues
Hw09 Fingerpointing Sourcing Performance IssuesCloudera, Inc.
 
Hw09 Cross Data Center Logs Processing
Hw09   Cross Data Center Logs ProcessingHw09   Cross Data Center Logs Processing
Hw09 Cross Data Center Logs ProcessingCloudera, Inc.
 
Hw09 Analytics And Reporting
Hw09   Analytics And ReportingHw09   Analytics And Reporting
Hw09 Analytics And ReportingCloudera, Inc.
 
Hw09 Matchmaking In The Cloud
Hw09   Matchmaking In The CloudHw09   Matchmaking In The Cloud
Hw09 Matchmaking In The CloudCloudera, Inc.
 
Doug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemDoug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemCloudera, Inc.
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.
 

Andere mochten auch (7)

Hw09 Fingerpointing Sourcing Performance Issues
Hw09   Fingerpointing  Sourcing Performance IssuesHw09   Fingerpointing  Sourcing Performance Issues
Hw09 Fingerpointing Sourcing Performance Issues
 
Hw09 Cross Data Center Logs Processing
Hw09   Cross Data Center Logs ProcessingHw09   Cross Data Center Logs Processing
Hw09 Cross Data Center Logs Processing
 
Hw09 Analytics And Reporting
Hw09   Analytics And ReportingHw09   Analytics And Reporting
Hw09 Analytics And Reporting
 
Hw09 Matchmaking In The Cloud
Hw09   Matchmaking In The CloudHw09   Matchmaking In The Cloud
Hw09 Matchmaking In The Cloud
 
Hadoop Puzzlers
Hadoop PuzzlersHadoop Puzzlers
Hadoop Puzzlers
 
Doug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemDoug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop Ecosystem
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 

Ähnlich wie Hw09 Optimizing Hadoop Deployments

Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Cloudera, Inc.
 
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...tdc-globalcode
 
Embree Ray Tracing Kernels
Embree Ray Tracing KernelsEmbree Ray Tracing Kernels
Embree Ray Tracing KernelsIntel® Software
 
Austin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intelAustin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intelVu Hung Nguyen
 
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRaySoftware-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRayIntel® Software
 
Intel Mobile Launch Information
Intel Mobile Launch InformationIntel Mobile Launch Information
Intel Mobile Launch InformationAnna Yovka
 
Intel Public Roadmap for Desktop, Mobile, Data Center
Intel Public Roadmap for Desktop, Mobile, Data CenterIntel Public Roadmap for Desktop, Mobile, Data Center
Intel Public Roadmap for Desktop, Mobile, Data CenterDr. Wilfred Lin (Ph.D.)
 
AI & Computer Vision (OpenVINO) - CPBR12
AI & Computer Vision (OpenVINO) - CPBR12AI & Computer Vision (OpenVINO) - CPBR12
AI & Computer Vision (OpenVINO) - CPBR12Jomar Silva
 
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applicationsLF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applicationsLF_DPDK
 
Intel® Open Image Denoise in Unity*
Intel® Open Image Denoise in Unity*Intel® Open Image Denoise in Unity*
Intel® Open Image Denoise in Unity*Intel® Software
 
Explore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEExplore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEIntel IT Center
 
E20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAIE20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAILeapMind Inc
 
Gary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year Horizon
Gary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year HorizonGary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year Horizon
Gary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year HorizonAugmentedWorldExpo
 
Intel® AI: Reinforcement Learning Coach
Intel® AI:  Reinforcement Learning Coach Intel® AI:  Reinforcement Learning Coach
Intel® AI: Reinforcement Learning Coach Intel® Software
 
Intel HPC Update
Intel HPC UpdateIntel HPC Update
Intel HPC UpdateIBM Danmark
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel Software Brasil
 
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...Databricks
 
2 new hw_features_cat_cod_etc
2 new hw_features_cat_cod_etc2 new hw_features_cat_cod_etc
2 new hw_features_cat_cod_etcvideos
 
High Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyHigh Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyIntel IT Center
 

Ähnlich wie Hw09 Optimizing Hadoop Deployments (20)

Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010
 
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
 
Embree Ray Tracing Kernels
Embree Ray Tracing KernelsEmbree Ray Tracing Kernels
Embree Ray Tracing Kernels
 
Austin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intelAustin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intel
 
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRaySoftware-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
 
Intel Mobile Launch Information
Intel Mobile Launch InformationIntel Mobile Launch Information
Intel Mobile Launch Information
 
Intel Public Roadmap for Desktop, Mobile, Data Center
Intel Public Roadmap for Desktop, Mobile, Data CenterIntel Public Roadmap for Desktop, Mobile, Data Center
Intel Public Roadmap for Desktop, Mobile, Data Center
 
AI & Computer Vision (OpenVINO) - CPBR12
AI & Computer Vision (OpenVINO) - CPBR12AI & Computer Vision (OpenVINO) - CPBR12
AI & Computer Vision (OpenVINO) - CPBR12
 
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applicationsLF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
 
Intel® Open Image Denoise in Unity*
Intel® Open Image Denoise in Unity*Intel® Open Image Denoise in Unity*
Intel® Open Image Denoise in Unity*
 
Explore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEExplore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XE
 
E20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAIE20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAI
 
Gary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year Horizon
Gary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year HorizonGary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year Horizon
Gary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year Horizon
 
Intel® AI: Reinforcement Learning Coach
Intel® AI:  Reinforcement Learning Coach Intel® AI:  Reinforcement Learning Coach
Intel® AI: Reinforcement Learning Coach
 
Intel HPC Update
Intel HPC UpdateIntel HPC Update
Intel HPC Update
 
VIOPS08: マイクロサーバー アーキテクチャトレンド
VIOPS08: マイクロサーバー アーキテクチャトレンドVIOPS08: マイクロサーバー アーキテクチャトレンド
VIOPS08: マイクロサーバー アーキテクチャトレンド
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
 
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
 
2 new hw_features_cat_cod_etc
2 new hw_features_cat_cod_etc2 new hw_features_cat_cod_etc
2 new hw_features_cat_cod_etc
 
High Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyHigh Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge Economy
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Kürzlich hochgeladen (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Hw09 Optimizing Hadoop Deployments

  • 1. Optimizing Hadoop* Workloads Nurcan Coskun Intel Software & Solutions Group October 2, 2009 Acknowledgements to Jason Dai, Intel SSG, for many of the test results and optimization techniques Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice. Copyright © 2009, Intel Corporation.
  • 2. Legal Disclaimers Disclaimers & Legal Notices THE INFORMATION IS FURNISHED FOR INFORMATIONAL USE ONLY, IS SUBJECT TO CHANGE WITHOUT NOTICE, AND SHOULD NOT BE CONSTRUED AS A COMMITMENT BY INTEL CORPORATION. INTEL CORPORATION ASSUMES NO RESPONSIBILITY OR LIABILITY FOR ANY ERRORS OR INACCURACIES THAT MAY APPEAR IN THIS DOCUMENT OR ANY SOFTWARE THAT MAY BE PROVIDED IN ASSOCIATION WITH THIS DOCUMENT. THIS INFORMATION IS PROVIDED "AS IS" AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THE USE OF THIS INFORMATION INCLUDING WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, COMPLIANCE WITH A SPECIFICATION OR STANDARD, MERCHANTABILITY OR NONINFRINGEMENT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit Intel Performance Benchmark Limitations INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's Web Site http://www.intel.com/. 2 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 3. Why Optimize Hadoop Deployments? Handle At In With More Lower Less Less Data Cost Time Power 3 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 4. Where to Optimize? Hardware Software 4 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 5. Hadoop Servers Masters: JobTracker, NameNode, Secondary NameNode – Deploy additional RAM and secondary power supplies – Ensure highest performance and reliability Slaves: DataNodes, TaskTrackers – Hadoop Framework handles slave failures well – Data blocks are replicated and distributed – Workload may be bound by I/O, memory or processor resources – The system level hardware should be adjusted on a case-by-case basis 5 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 6. Server Platform •Dual-socket servers are optimal for Hadoop deployments •Dual-socket servers are more efficient than large-scale multi- processor platforms from a per-node, cost benefit perspective •Dual-socket servers offset the added per-node hardware cost relative to entry-level servers through superior efficiencies in terms of load-balancing and parallelization overheads •Choosing hardware based on the most current platform technologies available helps to ensure the optimal intra-server throughput and efficiency 6 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 7. Processor Choice Matters Faster Handles More Data More Energy Efficient 7 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 8. Processor Choice Impacts Speed Data Source: Intel internal measurements by using Hadoop 0.19.1 as of September 20, 2009. Hardware configurations are on slide 22. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. 8 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 9. Processor Choice Impacts Throughput • Throughput = # of tasks completed / minute when cluster is at 100% utilization. • Intel Xeon processor 5500 provides up to 86% more throughput than 5400 series. Data Source: Intel internal measurements by using Hadoop 0.19.1 as of September 20, 2009. Hardware configurations are on slide 22. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. 9 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 10. Processor Scaling Inte l® X e on® P roce ssor 5400 Se rie s (H arpe rtown) C luste r Inte l® X e on® P roce ssor 5500 S e rie s (N e hale m) C luste r (Lowe r Value s are B e tte r) (Lowe r Value s are B e tte r) 30000 20000 1G B 1G B 2G B 18000 2G B 25000 3G B 16000 3G B 4G B JavaS ort Tom pletion Tim e (seconds) 4G B J a v a S ort Tom ple tion Tim e (s e c onds ) 5G B 14000 20000 5G B 6G B 6G B 7G B 12000 7G B 15000 8G B 10000 8G B 9G B 9G B 10G B 8000 10G B 10000 50G B 6000 50G B 100G B 100G B 150G B 4000 5000 150G B 200G B 200G B 2000 250G B 0 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Num be r of Node s Num be r of Node s •Hadoop workloads scales well on Intel processors •Intel® Xeon® processor 5500 can handle larger data sizes than 5400 series. Data Source: Intel internal measurements by using Hadoop 0.19.0 as of September 20, 2009. Hardware configurations are on slide 21. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. 10 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 11. Turn on Intel® Hyper-threading Technology Intel® Hyper-threading Intel® Xeon® Processor 5500 Series (Nehalem) Technology SMT effect in 8 node cluster (Lower Values Are Better) 250 JavaSort Completion Time (seconds) Increases performance for threaded applications delivering greater throughput 200 and responsiveness 150 SMT ON SMT OFF 100 50 0 1GB 2GB 3GB 4GB 5GB 6GB 7GB 8GB 9GB 10GB Data Set Size Up to 25% better performance Data Source: Intel internal measurements by using Hadoop 0.19.0 as of September 20, 2009. Hardware configurations are on slide 21. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. 11 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 12. Memory •Sufficient memory capacity is critical for efficient operation of servers in a Hadoop cluster, supporting high throughput by allowing large number of map/reduce tasks to be carried out simultaneously •Typical Hadoop applications require approximately 1-2 GB of RAM per processor core, which corresponds to 8-16GB for a dual-socket server using quad-core processors •Error Correcting Code (ECC) memory is highly recommended to detect and correct errors introduced during storage and transmission of data 12 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 13. Selecting Server Motherboard •Select server motherboards which are optimized for high density computing environments. – They should use high efficiency voltage regulators – They need to be optimized for airflow – They should use certified power supplies •Optimized server motherboards will use less power, need less cooling, and save money 13 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 14. Hard Disk and SSD •Large number of hard drives per server (4-6) •Hadoop orchestrates data provisioning and redundancy across individual nodes (Using RAID 0 is not needed) •SSD’s are faster and they require very little power, SSD usage will also eliminate cooling cost created by hard disk drives •Use SSD’s: – To store mission critical smaller data sets – To store map/reduce intermediate results – To replace HDD’s with SDD’s to reduce power consumption, increase throughput and improve performance 14 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 15. Use Intel® X25-E SATA SSD’s 10 N ode Inte l® X e on® L5520 (N e hale m) C luste r (Lowe r Value s are B e tte r) 2500 2000 JavaS ort Com pletion Tim e 1500 (seconds) hdd ssd 1000 500 0 1G B 10G B 50G B 80G B 100G B Da ta S e t S iz e Data Source: Intel internal measurements by using Hadoop 0.19.0 as of September 20, 2009. Hardware configurations are on slide 23. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. 15 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 16. System Software •Use a Linux* distribution based on kernel version 2.6.30 or later because of the optimizations included for energy and threading efficiency – For Example: energy consumption can be up to 60 percent (42 watts) higher at idle for each server using older versions of Linux •Optimize Linux* file system configurations – Noatime attribute – Open file descriptor limit •Use latest Java (for example Sun Java* 6u14) – Use 64 bit optimized JVM builds 16 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 17. Hadoop Configuration Tuning •The number of NameNode and JobTracker threads(10 -> 64) •The number of DataNode server threads (3 -> 8) •The number of work threads on HTTP server that runs on each TaskTracker (40-50) •HDFS replication factor (3) •Default HDFS block size (64MB -> 128MB) •Maximum number of map/reduce tasks per node – (cores_per_node)/2 -> 2*(cores_per_node) • The number of input streams (files) to be merged at once in map/reduce tasks (example: 100) • JVM settings • The total size of result and metadata buffers associates with a map task (100MB -> 200 MB) 17 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 18. System-stack Example Two-way Intel® Xeon® processor 5500 series Intel® X25-E SATA SSD’s Four to six 7200 RPM SATA drives 12-24 GB DDR3 ECC RAM Intel® Server Board S5500WB 80 PLUS* Gold Certified power supplies Linux* based on kernel 2.6.30 or later Sun Java* 6u14 or later Hadoop* (0.18.3 or 0.20.0) 18 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 19. Summary Hardware selection: • Intel® Xeon® 5500 (“Nehalem”) improves Hadoop Workload performance • Choosing an optimized server board such as Intel® SB5500WB (“WillowBrook”) can reduce power consumption • Use Intel® X25-E SATA SSD’s to improve performance Software & configurations: • Use latest Linux kernel • Turn on Intel® Hyper-threading • Optimize Hadoop Configuration • Tuning may be different for different workload types 19 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 20. References: 1. http://www.intel.com/p/en_US/products/server/processor 2. http://www.intel.com/it/pdf/server-rightsizing.pdf 3. http://www.80plus.org/ 4. https://opencirrus.org/content/agenda-open-cirrus-summit-palo- alto-june-8-9-2009 20 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 21. Cluster Configurations Information (Slides: “Processor Scaling” and “Turn on Intel® Hyper- threading”) Hardware Configuration Item Endeavor Atlantis Node count 1-10 nodes 1-10 nodes Platform Intel SR1600UR Intel SR1560SF system Intel S5520UR main board Intel S5400SF main board 1U chassis 1U chassis CPU/Stepping Intel® Xeon® X5560 C1 step Intel® Xeon® X5482; C0 step (Nehalem EP) (Harpertown) 2.8GHz / 6.4 QPI 1333 95 W 3.2 GHz / 12 MB L2 cache 1MB L2 cache, 8M L3 cache RAM 24 GB total/node 16 GB 6*4GB 1333MHz Reg ECC DDR3 (FBDIMM 8x2-GB 667MHz) Chipset Tylersburg Seaburg BIOS Version Rev 26 Rev 22.1 08 Apr 2008 7 Nov 2007 Interconnects Gigabit Ethernet Gigabit Ethernet QDR InfiniBand DDR InfiniBand Hard drive specs Seagate Cheetah NS Seagate Barracuda ES 400 GB SAS HDD 10kRPM 250 GB SATA HDD Model: ST3400755SS Model: ST3250620NS Using onboard Intel Entry Level Raid controller 21 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 22. Cluster Configurations Information (Slides: “Processor Choice Impacts Speed” and “Processor Choice Impacts Throughput”) Intel® Xeon® X5460-based server Processor: Dual-socket quad-core Intel® Xeon® X5460 3.16GHz Processor Memory: 16GB (DDR2 FBDIM ECC 667MHz) RAM Storage: 1 X 300GB 15K RPM SAS disk for system and log files, 4 X 1TB 7200RPM SATA for HDFS and intermediate results Network: 1 Gigabit Ethernet NIC BIOS: BIOS version S5000.86B.10.60.0091.100920081631EIST (Enhanced Intel SpeedStep Technology) disabled both hardware prefetcher and adjacent cache-line, prefetch disable Intel® Xeon® X5570-based server Processor: Dual-socket quad-core Intel® Xeon® X5570 2.93GHz Processor Memory: 16GB (DDR3 ECC 1333MHz) RAM Storage: 1 X 1TB 7200RPM SATA for system and log files, 4 X 1TB 7200RPM SATA for HDFS and intermediate results Network: 1 Gigabit Ethernet NIC BIOS: BIOS version 4.6.3 Both EIST (Enhanced Intel SpeedStep Technology) and Turbo mode disabled both hardware prefetcher and adjacent cache-line prefetch enabled, SMT (Simultaneous MultiThreading), enabled (Disabling hardware prefetcher and adjacent cache-line prefetch helps improve Hadoop performance on Xeon X5460 server according to our benchmarking.) 22 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.
  • 23. Cluster Configurations Information (Slides: “Use Intel® X25-E SATA SSD’s”) Slaves: • Intel® Xeon® L5520 Processor (Nehalem) @ 2.27 GHz CPUs 5.8 GB/sec QPI, 24 GBy RAM • Server Board: Intel® SB5500WB (Willowbrook) • 1x 1 TB SATA HDD boot disk, holds ${HOME} dirs: / • 2x 1 TB SATA HDD scratch/experiment disks: • 2x 64 GB Intel® X25-E SATA SLC SSD scratch/experiment disks •OS: Ubuntu* 9.04 == 2.6.28-4 kernel (to enable power saving with preserved performance) Master: •Intel® Xeon® Processor 2.93 GHz CPUs, 6.4 GB/sec QPI, 16 GBy RAM •Server Board: Intel® SB5500WB (Willowbrook) •Hard Disks: • 1x 500 GB SATA OS boot disk (/dev/sda1), holds installed software and ${HOME} dirs • 2x 500 GB SATA scratch disks • 2x64 GB Intel® X25-E SATA SLC SSDs •OS: RedHat* Enterprise Linux 5.3 Server x64t 23 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2009, Intel Corporation.