SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Managing a Large Hadoop Cluster


Jeff Hammerbacher
Manager, Data
May 28 - 29, 2008
Anatomy of the Facebook Cluster
Hardware
▪   Individual nodes
    ▪   CPU: Intel Xeon dual socket quad cores (8 cores per box)
    ▪   Memory: 16 GB ECC DRAM
    ▪   Disk: 4 x 1 TB 7200 RPM SATA
    ▪   Network: 1 gE
▪   Topology
    ▪   320 nodes arranged into 8 racks of 40 nodes each
    ▪   8 x 1 Gbps links out to the core switch
Anatomy of the Facebook Cluster
Functional Separation
▪   Need to have test, staging, and production clusters
▪   Break nodes into groups of 10
▪   First 30 machines on each rack run DFS
▪   Last 10 machines used for DFS and upgrade testing or left idle
▪   Run main MapReduce cluster on 20 machines in each rack
▪   Run test MapReduce cluster on 10 machines in four racks
▪   Do MapReduce testing on 10 machines in four racks
▪   A few other MapReduce clusters for isolated applications
Anatomy of the Facebook Cluster
Software for Administration
▪   Most utilities are included in hadoop/bin
    ▪   Format DFS, start/stop daemons, fsck, rebalance blocks, etc.
▪   Hypershell (internal): provides distributed shell functionality
    ▪   See also: dsh, GXP, Capistrano, ClusterIt
▪   Cfengine: ensure uniform system images, configuration, and libraries
▪   ODS (internal): monitoring and alerting
    ▪   See also: Ganglia for monitoring, Nagios for alerting
▪   Cacti: network monitoring
Anatomy of the Facebook Cluster
 Excerpts from Facebook’s conf/hadoop-site.xml
dfs.block.size                      134,217,728                                 Larger block size for less NN metadata
dfs.datanode.du.reserved            1,024,000,000                               Don’t fill up the local disk
dfs.namenode.handler.count          40                                          More NN server threads for DN RPCs
dfs.network.script                  /mnt/vol/hive/stable/bin/rackid.pl          Print machine network name
fs.trash.interval                   1,440
fs.trash.root                       /Trash
io.file.buffer.size                  32,768                                      Size of r/w buffer used by SequenceFile
io.sort.factor                      100                                         More streams merged while sorting
io.sort.mb                          200                                         Higher memory limit while sorting data
mapred.child.java.opts              -Xmx1024m -Djava.net.preferIPv4Stack=true   Large heap size; avoid RPC timeout
mapred.linerecordreader.maxlength   1,000,000                                   Skip malformed lines
mapred.min.split.size               65,536
mapred.reduce.copy.backoff          5
mapred.reduce.parallel.copies       20                                          More threads to fetch map output data
mapred.tasktracker.tasks.maximum    5
mapred.speculative.map.enabled      TRUE
mapred.speculative.reduce.enabled   FALSE
mapred.speculative.map.gap          1
webinterface.private.actions        TRUE
Anatomy of the Facebook Cluster
HDFS Tips from Dhruba Borthakur
▪   Be careful when using profilers to examine NN state
▪   Never load many small files
▪   Always use java 1.6, otherwise NN will consume about 50% more CPU
▪   When decommissioning DNs, do a max of 10 machines or so at a time,
    otherwise the NN gets overloaded
▪   Run fsck every night and monitor the number of missing/under-
    replicated blocks
▪   If a block stays unreplicated, force its replication factor up, then down
▪   When adding new DNs to the cluster, run the rebalancing script
Anatomy of the Facebook Cluster
Common Issues
▪   Client libraries out of sync
▪   Non-uniform availability of software or libraries on TT nodes
▪   Bad disk: manifested as ROFS
▪   NIC decides to go into 100 Mbps Ethernet mode
▪   DN reserved amount not honored resulting in disk filled to capacity
▪   Resource contention
Anatomy of the Facebook Cluster
More About Monitoring
▪   Hadoop has an abstract interface for metrics reporting
    ▪   org.apache.hadoop.metrics.spi
    ▪   Currently has “file” and “ganglia” implementations
    ▪   Every Metric belongs to a Context and a Record
    ▪   Metrics can also have Tags for disambiguation
    ▪   See conf/hadoop-metrics.properties for configuration
▪   Web interfaces to NN and JT also have detailed information
▪   A variety of cron’d scripts also take care of system-level monitoring
Anatomy of the Facebook Cluster
More About Performance
▪   In addition to the metrics package, logs are rich source of information
    ▪   Starting to regularly parse logs and store information into MySQL db
▪   Multiple research labs working on this area
    ▪   Berkeley RAD Lab
    ▪   Carnegie Mellon PDL
    ▪   Watch OSDI this year for papers
Anatomy of the Facebook Cluster
Recent DFS Performance Numbers
▪   All DNs are on same rack to isolate switch performance from test
▪   8 DNs, each with 2 map slots: hence performance levels off at 16 files
▪   Each mapper writes 1 GB/file. Block size is 128MB. Replication factor is 3.
▪   Uses Java 1.6
             Number of Files        0.15.4 (MB/s)        0.17.0 (MB/s)
        1                      30                   60
        2                      25                   53
        3                      20                   43
        5                      18                   33
        8                      9                    27
        13                     8                    18
        20                     9                    17
        24                     8                    18
        28                     8                    16
X-Trace + Hadoop
HDFS Performance analysis
Anatomy of the Facebook Cluster
Resource Management and Job Scheduling
▪   By far the most intensive cluster management responsibility
▪   At Facebook: manually set job priorities and kill jobs
▪   HOD
    ▪   Integrates with Torque resource manager
    ▪   Torque frequently paired with Maui cluster scheduler
    ▪   Other options
        ▪   Sun Grid Engine
        ▪   Condor
        ▪   Platform LSF (commercial)
Manual Job Scheduling
Job Priorities and “Kill this Job” from JT Web Interface
Anatomy of the Facebook Cluster
Recent Cluster Statistics
▪   From May 2nd to May 21st:
    ▪   Total jobs: 8,794
    ▪   Total map tasks: 1,362,429
    ▪   Total reduce tasks: 86,806
    ▪   Average duration of a successful job: 296 s
    ▪   Average duration of a successful map: 81 s
    ▪   Average duration of a successful reduce: 678 s
(c) 2008 Facebook, Inc. or its licensors.  quot;Facebookquot; is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Installation and basic configuration
Hadoop Installation and basic configurationHadoop Installation and basic configuration
Hadoop Installation and basic configurationGerrit van Vuuren
 
Alexander Ignatyev "MapReduce infrastructure"
Alexander Ignatyev "MapReduce infrastructure"Alexander Ignatyev "MapReduce infrastructure"
Alexander Ignatyev "MapReduce infrastructure"Yandex
 
LizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webLizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webSzymon Haly
 
Как PostgreSQL работает с диском
Как PostgreSQL работает с дискомКак PostgreSQL работает с диском
Как PostgreSQL работает с дискомPostgreSQL-Consulting
 
Comparison of foss distributed storage
Comparison of foss distributed storageComparison of foss distributed storage
Comparison of foss distributed storageMarian Marinov
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
XtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsXtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsLaurynas Biveinis
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storageMarian Marinov
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksMarian Marinov
 
Managing Large Datasets in LabVIEW
Managing Large Datasets in LabVIEWManaging Large Datasets in LabVIEW
Managing Large Datasets in LabVIEWJames McNally
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basicHafizur Rahman
 
Annette g09 job file for cyclohexene
Annette g09 job file for cyclohexeneAnnette g09 job file for cyclohexene
Annette g09 job file for cyclohexeneDr Robert Craig PhD
 

Was ist angesagt? (20)

Hadoop Installation and basic configuration
Hadoop Installation and basic configurationHadoop Installation and basic configuration
Hadoop Installation and basic configuration
 
Alexander Ignatyev "MapReduce infrastructure"
Alexander Ignatyev "MapReduce infrastructure"Alexander Ignatyev "MapReduce infrastructure"
Alexander Ignatyev "MapReduce infrastructure"
 
HPC_HMMER.pptx
HPC_HMMER.pptxHPC_HMMER.pptx
HPC_HMMER.pptx
 
Tune hadoop
Tune hadoopTune hadoop
Tune hadoop
 
LizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webLizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-web
 
Как PostgreSQL работает с диском
Как PostgreSQL работает с дискомКак PostgreSQL работает с диском
Как PostgreSQL работает с диском
 
PgconfSV compression
PgconfSV compressionPgconfSV compression
PgconfSV compression
 
Comparison of foss distributed storage
Comparison of foss distributed storageComparison of foss distributed storage
Comparison of foss distributed storage
 
Oracle NOLOGGING
Oracle NOLOGGINGOracle NOLOGGING
Oracle NOLOGGING
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
XtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsXtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithms
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storage
 
Go replicator
Go replicatorGo replicator
Go replicator
 
Multimaster
MultimasterMultimaster
Multimaster
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
 
Managing Large Datasets in LabVIEW
Managing Large Datasets in LabVIEWManaging Large Datasets in LabVIEW
Managing Large Datasets in LabVIEW
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basic
 
Curcumin job file
Curcumin job fileCurcumin job file
Curcumin job file
 
Annette g09 job file for cyclohexene
Annette g09 job file for cyclohexeneAnnette g09 job file for cyclohexene
Annette g09 job file for cyclohexene
 

Ähnlich wie 20080528dublinpt3

Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tipsSubhas Kumar Ghosh
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in productionParis Data Engineers !
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityEdureka!
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in LinuxRaghu Udiyar
 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insightsOmid Vahdaty
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...Fred de Villamil
 
Jvm problem diagnostics
Jvm problem diagnosticsJvm problem diagnostics
Jvm problem diagnosticsDanijel Mitar
 
Migrating the elastic stack to the cloud, or application logging @ travix
 Migrating the elastic stack to the cloud, or application logging @ travix Migrating the elastic stack to the cloud, or application logging @ travix
Migrating the elastic stack to the cloud, or application logging @ travixRuslan Lutsenko
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...Joao Galdino Mello de Souza
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Alex Rasmussen
 
Marian Marinov, 1H Ltd.
Marian Marinov, 1H Ltd.Marian Marinov, 1H Ltd.
Marian Marinov, 1H Ltd.Ontico
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Shak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalShak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalTommy Lee
 
A Consolidation Success Story
A Consolidation Success StoryA Consolidation Success Story
A Consolidation Success StoryEnkitec
 

Ähnlich wie 20080528dublinpt3 (20)

Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
 
Linux Huge Pages
Linux Huge PagesLinux Huge Pages
Linux Huge Pages
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in Linux
 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insights
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
 
Jvm problem diagnostics
Jvm problem diagnosticsJvm problem diagnostics
Jvm problem diagnostics
 
Migrating the elastic stack to the cloud, or application logging @ travix
 Migrating the elastic stack to the cloud, or application logging @ travix Migrating the elastic stack to the cloud, or application logging @ travix
Migrating the elastic stack to the cloud, or application logging @ travix
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)
 
Marian Marinov, 1H Ltd.
Marian Marinov, 1H Ltd.Marian Marinov, 1H Ltd.
Marian Marinov, 1H Ltd.
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Shak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalShak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-final
 
A Consolidation Success Story
A Consolidation Success StoryA Consolidation Success Story
A Consolidation Success Story
 
20080611accel
20080611accel20080611accel
20080611accel
 

Mehr von Jeff Hammerbacher (20)

20120223keystone
20120223keystone20120223keystone
20120223keystone
 
20100714accel
20100714accel20100714accel
20100714accel
 
20100608sigmod
20100608sigmod20100608sigmod
20100608sigmod
 
20100513brown
20100513brown20100513brown
20100513brown
 
20100423sage
20100423sage20100423sage
20100423sage
 
20100418sos
20100418sos20100418sos
20100418sos
 
20100301icde
20100301icde20100301icde
20100301icde
 
20100201hplabs
20100201hplabs20100201hplabs
20100201hplabs
 
20100128ebay
20100128ebay20100128ebay
20100128ebay
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091110startup2startup
20091110startup2startup20091110startup2startup
20091110startup2startup
 
20091030nasajpl
20091030nasajpl20091030nasajpl
20091030nasajpl
 
20091027genentech
20091027genentech20091027genentech
20091027genentech
 
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
 
20090622 Velocity
20090622 Velocity20090622 Velocity
20090622 Velocity
 
20090422 Www
20090422 Www20090422 Www
20090422 Www
 
20090309berkeley
20090309berkeley20090309berkeley
20090309berkeley
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
20081022cca
20081022cca20081022cca
20081022cca
 

Kürzlich hochgeladen

Unveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesUnveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesDoe Paoro
 
Technical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamTechnical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamArik Fletcher
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckHajeJanKamps
 
Jewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource CentreJewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource CentreNZSG
 
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...Associazione Digital Days
 
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...Hector Del Castillo, CPM, CPMM
 
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...ssuserf63bd7
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03DallasHaselhorst
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryEffective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryWhittensFineJewelry1
 
Supercharge Your eCommerce Stores-acowebs
Supercharge Your eCommerce Stores-acowebsSupercharge Your eCommerce Stores-acowebs
Supercharge Your eCommerce Stores-acowebsGOKUL JS
 
NAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataNAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
Healthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterHealthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterJamesConcepcion7
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxShruti Mittal
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdfShaun Heinrichs
 
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfGUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfDanny Diep To
 
Appkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxAppkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxappkodes
 

Kürzlich hochgeladen (20)

Unveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesUnveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic Experiences
 
Technical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamTechnical Leaders - Working with the Management Team
Technical Leaders - Working with the Management Team
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deck
 
Jewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource CentreJewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource Centre
 
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
Lucia Ferretti, Lead Business Designer; Matteo Meschini, Business Designer @T...
 
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
 
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptxThe Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
 
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryEffective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
 
Supercharge Your eCommerce Stores-acowebs
Supercharge Your eCommerce Stores-acowebsSupercharge Your eCommerce Stores-acowebs
Supercharge Your eCommerce Stores-acowebs
 
WAM Corporate Presentation April 12 2024.pdf
WAM Corporate Presentation April 12 2024.pdfWAM Corporate Presentation April 12 2024.pdf
WAM Corporate Presentation April 12 2024.pdf
 
NAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataNAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors Data
 
Healthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterHealthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare Newsletter
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptx
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf
 
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfGUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
 
Appkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxAppkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptx
 

20080528dublinpt3

  • 1.
  • 2. Managing a Large Hadoop Cluster Jeff Hammerbacher Manager, Data May 28 - 29, 2008
  • 3. Anatomy of the Facebook Cluster Hardware ▪ Individual nodes ▪ CPU: Intel Xeon dual socket quad cores (8 cores per box) ▪ Memory: 16 GB ECC DRAM ▪ Disk: 4 x 1 TB 7200 RPM SATA ▪ Network: 1 gE ▪ Topology ▪ 320 nodes arranged into 8 racks of 40 nodes each ▪ 8 x 1 Gbps links out to the core switch
  • 4. Anatomy of the Facebook Cluster Functional Separation ▪ Need to have test, staging, and production clusters ▪ Break nodes into groups of 10 ▪ First 30 machines on each rack run DFS ▪ Last 10 machines used for DFS and upgrade testing or left idle ▪ Run main MapReduce cluster on 20 machines in each rack ▪ Run test MapReduce cluster on 10 machines in four racks ▪ Do MapReduce testing on 10 machines in four racks ▪ A few other MapReduce clusters for isolated applications
  • 5. Anatomy of the Facebook Cluster Software for Administration ▪ Most utilities are included in hadoop/bin ▪ Format DFS, start/stop daemons, fsck, rebalance blocks, etc. ▪ Hypershell (internal): provides distributed shell functionality ▪ See also: dsh, GXP, Capistrano, ClusterIt ▪ Cfengine: ensure uniform system images, configuration, and libraries ▪ ODS (internal): monitoring and alerting ▪ See also: Ganglia for monitoring, Nagios for alerting ▪ Cacti: network monitoring
  • 6. Anatomy of the Facebook Cluster Excerpts from Facebook’s conf/hadoop-site.xml dfs.block.size 134,217,728 Larger block size for less NN metadata dfs.datanode.du.reserved 1,024,000,000 Don’t fill up the local disk dfs.namenode.handler.count 40 More NN server threads for DN RPCs dfs.network.script /mnt/vol/hive/stable/bin/rackid.pl Print machine network name fs.trash.interval 1,440 fs.trash.root /Trash io.file.buffer.size 32,768 Size of r/w buffer used by SequenceFile io.sort.factor 100 More streams merged while sorting io.sort.mb 200 Higher memory limit while sorting data mapred.child.java.opts -Xmx1024m -Djava.net.preferIPv4Stack=true Large heap size; avoid RPC timeout mapred.linerecordreader.maxlength 1,000,000 Skip malformed lines mapred.min.split.size 65,536 mapred.reduce.copy.backoff 5 mapred.reduce.parallel.copies 20 More threads to fetch map output data mapred.tasktracker.tasks.maximum 5 mapred.speculative.map.enabled TRUE mapred.speculative.reduce.enabled FALSE mapred.speculative.map.gap 1 webinterface.private.actions TRUE
  • 7. Anatomy of the Facebook Cluster HDFS Tips from Dhruba Borthakur ▪ Be careful when using profilers to examine NN state ▪ Never load many small files ▪ Always use java 1.6, otherwise NN will consume about 50% more CPU ▪ When decommissioning DNs, do a max of 10 machines or so at a time, otherwise the NN gets overloaded ▪ Run fsck every night and monitor the number of missing/under- replicated blocks ▪ If a block stays unreplicated, force its replication factor up, then down ▪ When adding new DNs to the cluster, run the rebalancing script
  • 8. Anatomy of the Facebook Cluster Common Issues ▪ Client libraries out of sync ▪ Non-uniform availability of software or libraries on TT nodes ▪ Bad disk: manifested as ROFS ▪ NIC decides to go into 100 Mbps Ethernet mode ▪ DN reserved amount not honored resulting in disk filled to capacity ▪ Resource contention
  • 9. Anatomy of the Facebook Cluster More About Monitoring ▪ Hadoop has an abstract interface for metrics reporting ▪ org.apache.hadoop.metrics.spi ▪ Currently has “file” and “ganglia” implementations ▪ Every Metric belongs to a Context and a Record ▪ Metrics can also have Tags for disambiguation ▪ See conf/hadoop-metrics.properties for configuration ▪ Web interfaces to NN and JT also have detailed information ▪ A variety of cron’d scripts also take care of system-level monitoring
  • 10. Anatomy of the Facebook Cluster More About Performance ▪ In addition to the metrics package, logs are rich source of information ▪ Starting to regularly parse logs and store information into MySQL db ▪ Multiple research labs working on this area ▪ Berkeley RAD Lab ▪ Carnegie Mellon PDL ▪ Watch OSDI this year for papers
  • 11. Anatomy of the Facebook Cluster Recent DFS Performance Numbers ▪ All DNs are on same rack to isolate switch performance from test ▪ 8 DNs, each with 2 map slots: hence performance levels off at 16 files ▪ Each mapper writes 1 GB/file. Block size is 128MB. Replication factor is 3. ▪ Uses Java 1.6 Number of Files 0.15.4 (MB/s) 0.17.0 (MB/s) 1 30 60 2 25 53 3 20 43 5 18 33 8 9 27 13 8 18 20 9 17 24 8 18 28 8 16
  • 12. X-Trace + Hadoop HDFS Performance analysis
  • 13. Anatomy of the Facebook Cluster Resource Management and Job Scheduling ▪ By far the most intensive cluster management responsibility ▪ At Facebook: manually set job priorities and kill jobs ▪ HOD ▪ Integrates with Torque resource manager ▪ Torque frequently paired with Maui cluster scheduler ▪ Other options ▪ Sun Grid Engine ▪ Condor ▪ Platform LSF (commercial)
  • 14. Manual Job Scheduling Job Priorities and “Kill this Job” from JT Web Interface
  • 15. Anatomy of the Facebook Cluster Recent Cluster Statistics ▪ From May 2nd to May 21st: ▪ Total jobs: 8,794 ▪ Total map tasks: 1,362,429 ▪ Total reduce tasks: 86,806 ▪ Average duration of a successful job: 296 s ▪ Average duration of a successful map: 81 s ▪ Average duration of a successful reduce: 678 s
  • 16. (c) 2008 Facebook, Inc. or its licensors.  quot;Facebookquot; is a registered trademark of Facebook, Inc.. All rights reserved. 1.0