Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
How to monitor the  $H!T out of Hadoop Developing a comprehensive open approach to monitoring hadoop clusters
Relevant Hadoop Information <ul><li>From 3 – 3000 Nodes </li></ul><ul><li>Hardware/Software failures “common” </li></ul><u...
Monitoring Software <ul><li>Nagios –  </li></ul><ul><ul><li>Red Yellow Green Alerts, Escalations </li></ul></ul><ul><ul><l...
Cacti <ul><li>Performance Graphing System </li></ul><ul><li>RRD/RRA Front End </li></ul><ul><li>Slick Web Interface </li><...
 
hadoop-cacti-jtg <ul><li>JMX Fetching Code w/ (kick off) scripts </li></ul><ul><li>Cacti templates For Hadoop </li></ul><u...
Hadoop JMX
Sample Cluster P1 <ul><li>NameNode & SecNameNode </li></ul><ul><ul><li>Hardware RAID </li></ul></ul><ul><ul><li>8 GB RAM <...
A Sample Cluster p2 <ul><li>Slave (hadoopdata1-XXXX) </li></ul><ul><ul><li>JBOD 8x 1TB SATA Disk </li></ul></ul><ul><ul><l...
Prerequisites <ul><li>Nagios (install) DAG RPMs </li></ul><ul><li>Cacti (install) Several RPMS </li></ul><ul><li>Liberal n...
Alerts & Escalations <ul><li>X nodes * Y Services = < Sleep </li></ul><ul><li>Define a policy  </li></ul><ul><ul><li>Wake ...
Wake Me Up’s <ul><li>NameNode </li></ul><ul><ul><li>Disk Full (Big Big Headache) </li></ul></ul><ul><ul><li>RAID Array Iss...
Don’t Wake Me Up’s <ul><li>Or ‘Wake someone else up’ </li></ul><ul><li>DataNode </li></ul><ul><ul><li>Warning Currently Fa...
Monitoring Battle Plan <ul><li>Start With the Basics </li></ul><ul><ul><li>Ping, Disk </li></ul></ul><ul><li>Add Hadoop Sp...
The Basics Nagios <ul><li>Nagios (All Nodes) </li></ul><ul><ul><li>Host up (Ping check) </li></ul></ul><ul><ul><li>Disk % ...
The Basics Cacti <ul><li>Cacti (All Nodes) </li></ul><ul><ul><li>CPU (full CPU) </li></ul></ul><ul><ul><li>RAM/SWAP  </li>...
Disk Utilization
RAID Tools <ul><li>Hpacucli – not a Street Fighter move </li></ul><ul><ul><li>Alerts on RAID events (NameNode)  </li></ul>...
Before you jump in <ul><li>X Nodes * Y Checks * = Lots of work </li></ul><ul><li>About 3 Nodes into the process … </li></u...
Nagios <ul><li>Answers “IS IT RUNNING?” </li></ul><ul><li>Text based Configuration </li></ul>
Cacti <ul><li>Answers “HOW WELL IS IT RUNNING?” </li></ul><ul><li>Web Based configuration  </li></ul><ul><ul><li>php-cli t...
Monitoring Battle Plan Thus Far <ul><li>Start With the Basics </li></ul><ul><ul><li>Ping, Disk !!!!!!Done!!!!!! </li></ul>...
Add Hadoop Specific Alarms <ul><li>Hadoop Components with a Web Interface </li></ul><ul><ul><li>NameNode 50070 </li></ul><...
nagios_check_commands.cfg <ul><li>Component Failure </li></ul><ul><li>(Future) Newer Hadoop will have XML status  </li></u...
Monitoring Battle Plan <ul><li>Start With the Basics </li></ul><ul><ul><li>Ping, Disk (Done) </li></ul></ul><ul><li>Add Ha...
JMX Graphing <ul><li>Enable JMX </li></ul><ul><li>Import Templates </li></ul>
JMX Graphing
JMX Graphing
JMX Graphing
 
Standard Java JMX
Monitoring Battle Plan Thus Far <ul><li>Start With the Basics !!!!!!Done!!!!! </li></ul><ul><ul><li>Ping, Disk </li></ul><...
Add JMX based Alarms <ul><li>hadoop-cacti-jtg is flexible </li></ul><ul><ul><li>extend fetch classes </li></ul></ul><ul><u...
Quick JMX Base Walkthrough  <ul><li>url, user, pass, object specified from CLI </li></ul><ul><li>wantedVariables, wantedOp...
Extend for NameNode
Extend for Nagios
Monitoring Battle Plan <ul><li>Start With the Basics !DONE! </li></ul><ul><ul><li>Ping, Disk </li></ul></ul><ul><li>Add Ha...
Review <ul><li>File System Growth </li></ul><ul><ul><li>Size </li></ul></ul><ul><ul><li>Number of Files </li></ul></ul><ul...
The Future <ul><li>JMX Coming to JobTracker and TaskTracker (0.21) </li></ul><ul><ul><li>Collect and Graph Jobs Running </...
Nächste SlideShare
Wird geladen in …5
×

von

Hadoop Monitoring best Practices Slide 1 Hadoop Monitoring best Practices Slide 2 Hadoop Monitoring best Practices Slide 3 Hadoop Monitoring best Practices Slide 4 Hadoop Monitoring best Practices Slide 5 Hadoop Monitoring best Practices Slide 6 Hadoop Monitoring best Practices Slide 7 Hadoop Monitoring best Practices Slide 8 Hadoop Monitoring best Practices Slide 9 Hadoop Monitoring best Practices Slide 10 Hadoop Monitoring best Practices Slide 11 Hadoop Monitoring best Practices Slide 12 Hadoop Monitoring best Practices Slide 13 Hadoop Monitoring best Practices Slide 14 Hadoop Monitoring best Practices Slide 15 Hadoop Monitoring best Practices Slide 16 Hadoop Monitoring best Practices Slide 17 Hadoop Monitoring best Practices Slide 18 Hadoop Monitoring best Practices Slide 19 Hadoop Monitoring best Practices Slide 20 Hadoop Monitoring best Practices Slide 21 Hadoop Monitoring best Practices Slide 22 Hadoop Monitoring best Practices Slide 23 Hadoop Monitoring best Practices Slide 24 Hadoop Monitoring best Practices Slide 25 Hadoop Monitoring best Practices Slide 26 Hadoop Monitoring best Practices Slide 27 Hadoop Monitoring best Practices Slide 28 Hadoop Monitoring best Practices Slide 29 Hadoop Monitoring best Practices Slide 30 Hadoop Monitoring best Practices Slide 31 Hadoop Monitoring best Practices Slide 32 Hadoop Monitoring best Practices Slide 33 Hadoop Monitoring best Practices Slide 34 Hadoop Monitoring best Practices Slide 35 Hadoop Monitoring best Practices Slide 36 Hadoop Monitoring best Practices Slide 37 Hadoop Monitoring best Practices Slide 38 Hadoop Monitoring best Practices Slide 39
Nächste SlideShare
Optimizing MapReduce Job performance
Weiter
Herunterladen, um offline zu lesen und im Vollbildmodus anzuzeigen.

19 Gefällt mir

Teilen

Herunterladen, um offline zu lesen

Hadoop Monitoring best Practices

Herunterladen, um offline zu lesen

Monitoring hadoop With Cacti and Nagios

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Hadoop Monitoring best Practices

  1. 1. How to monitor the $H!T out of Hadoop Developing a comprehensive open approach to monitoring hadoop clusters
  2. 2. Relevant Hadoop Information <ul><li>From 3 – 3000 Nodes </li></ul><ul><li>Hardware/Software failures “common” </li></ul><ul><li>Redundant Components DataNode, TaskTracker </li></ul><ul><li>Non-redundant Components NameNode, JobTracker, SecondaryNameNode </li></ul><ul><li>Fast Evolving Technology (Best Practices?) </li></ul>
  3. 3. Monitoring Software <ul><li>Nagios – </li></ul><ul><ul><li>Red Yellow Green Alerts, Escalations </li></ul></ul><ul><ul><li>Defacto Standard – Widely deployed </li></ul></ul><ul><ul><li>Text base configuration </li></ul></ul><ul><ul><li>Web Interface </li></ul></ul><ul><ul><li>Pluggable with shell scripts/external apps </li></ul></ul><ul><ul><ul><li>Return 0 - OK </li></ul></ul></ul>
  4. 4. Cacti <ul><li>Performance Graphing System </li></ul><ul><li>RRD/RRA Front End </li></ul><ul><li>Slick Web Interface </li></ul><ul><li>Template System for Graph Types </li></ul><ul><li>Pluggable </li></ul><ul><ul><li>SNMP input </li></ul></ul><ul><ul><li>Shell script /external program </li></ul></ul>
  5. 6. hadoop-cacti-jtg <ul><li>JMX Fetching Code w/ (kick off) scripts </li></ul><ul><li>Cacti templates For Hadoop </li></ul><ul><li>Premade Nagios Check Scripts </li></ul><ul><li>Helper/Batch/automation scripts </li></ul><ul><li>Apache License </li></ul>
  6. 7. Hadoop JMX
  7. 8. Sample Cluster P1 <ul><li>NameNode & SecNameNode </li></ul><ul><ul><li>Hardware RAID </li></ul></ul><ul><ul><li>8 GB RAM </li></ul></ul><ul><ul><li>1x QUAD CORE </li></ul></ul><ul><ul><li>DerbyDB (hive) on SecNameNode </li></ul></ul><ul><li>JobTracker </li></ul><ul><ul><li>8GB RAM </li></ul></ul><ul><ul><li>1x QUAD CORE </li></ul></ul>
  8. 9. A Sample Cluster p2 <ul><li>Slave (hadoopdata1-XXXX) </li></ul><ul><ul><li>JBOD 8x 1TB SATA Disk </li></ul></ul><ul><ul><li>RAM 16GB </li></ul></ul><ul><ul><li>2x Quad Core </li></ul></ul>
  9. 10. Prerequisites <ul><li>Nagios (install) DAG RPMs </li></ul><ul><li>Cacti (install) Several RPMS </li></ul><ul><li>Liberal network access to the cluster </li></ul>
  10. 11. Alerts & Escalations <ul><li>X nodes * Y Services = < Sleep </li></ul><ul><li>Define a policy </li></ul><ul><ul><li>Wake Me Up’s (SMS) </li></ul></ul><ul><ul><li>Don’t Wake Me Up’s (EMAIL) </li></ul></ul><ul><ul><li>Review (Daily, Weekly, Monthly) </li></ul></ul>
  11. 12. Wake Me Up’s <ul><li>NameNode </li></ul><ul><ul><li>Disk Full (Big Big Headache) </li></ul></ul><ul><ul><li>RAID Array Issues (failed disk) </li></ul></ul><ul><li>JobTracker </li></ul><ul><li>SecNameNode </li></ul><ul><ul><li>Do not realize it is not working too late </li></ul></ul>
  12. 13. Don’t Wake Me Up’s <ul><li>Or ‘Wake someone else up’ </li></ul><ul><li>DataNode </li></ul><ul><ul><li>Warning Currently Failed Disk will down the Data Node (see Jira) </li></ul></ul><ul><li>TaskTracker </li></ul><ul><li>Hardware </li></ul><ul><ul><li>Bad Disk (Start RMA) </li></ul></ul><ul><li>Slaves are expendable (up to a point) </li></ul>
  13. 14. Monitoring Battle Plan <ul><li>Start With the Basics </li></ul><ul><ul><li>Ping, Disk </li></ul></ul><ul><li>Add Hadoop Specific Alarms </li></ul><ul><ul><li>check_data_node </li></ul></ul><ul><li>Add JMX Graphing </li></ul><ul><ul><li>NameNodeOperations </li></ul></ul><ul><li>Add JMX Based alarms </li></ul><ul><ul><li>FilesTotal > 1,000,000 or LiveNodes < 50% </li></ul></ul>
  14. 15. The Basics Nagios <ul><li>Nagios (All Nodes) </li></ul><ul><ul><li>Host up (Ping check) </li></ul></ul><ul><ul><li>Disk % Full </li></ul></ul><ul><ul><li>SWAP > 85 % </li></ul></ul><ul><li>* Load based alarms are somewhat useless 389% CPU load is not necessarily a bad thing in Hadoopville </li></ul>
  15. 16. The Basics Cacti <ul><li>Cacti (All Nodes) </li></ul><ul><ul><li>CPU (full CPU) </li></ul></ul><ul><ul><li>RAM/SWAP </li></ul></ul><ul><ul><li>Network </li></ul></ul><ul><ul><li>Disk Usage </li></ul></ul>
  16. 17. Disk Utilization
  17. 18. RAID Tools <ul><li>Hpacucli – not a Street Fighter move </li></ul><ul><ul><li>Alerts on RAID events (NameNode) </li></ul></ul><ul><ul><ul><li>Disk failed </li></ul></ul></ul><ul><ul><ul><li>Rebuilding </li></ul></ul></ul><ul><ul><li>JBOD (DataNode) </li></ul></ul><ul><ul><ul><li>Failed Drive </li></ul></ul></ul><ul><ul><ul><li>Drive Errors </li></ul></ul></ul><ul><li>Dell, SUN, Vendor Specific Tools </li></ul>
  18. 19. Before you jump in <ul><li>X Nodes * Y Checks * = Lots of work </li></ul><ul><li>About 3 Nodes into the process … </li></ul><ul><ul><li>Wait!!! I need some interns!!! </li></ul></ul><ul><li>Solution S.I.C.C.T. Semi-Intelligent-Configuration-cloning-tools </li></ul><ul><ul><li>(I made that up) </li></ul></ul><ul><ul><li>(for this presentation) </li></ul></ul>
  19. 20. Nagios <ul><li>Answers “IS IT RUNNING?” </li></ul><ul><li>Text based Configuration </li></ul>
  20. 21. Cacti <ul><li>Answers “HOW WELL IS IT RUNNING?” </li></ul><ul><li>Web Based configuration </li></ul><ul><ul><li>php-cli tools </li></ul></ul>
  21. 22. Monitoring Battle Plan Thus Far <ul><li>Start With the Basics </li></ul><ul><ul><li>Ping, Disk !!!!!!Done!!!!!! </li></ul></ul><ul><li>Add Hadoop Specific Alarms </li></ul><ul><ul><li>check_data_node </li></ul></ul><ul><li>Add JMX Graphing </li></ul><ul><ul><li>NameNodeOperations </li></ul></ul><ul><li>Add JMX Based alarms </li></ul><ul><ul><li>FilesTotal > 1,000,000 or LiveNodes < 50% </li></ul></ul>
  22. 23. Add Hadoop Specific Alarms <ul><li>Hadoop Components with a Web Interface </li></ul><ul><ul><li>NameNode 50070 </li></ul></ul><ul><ul><li>JobTracker 50030 </li></ul></ul><ul><ul><li>TaskTracker 50060 </li></ul></ul><ul><ul><li>DataNode 50075 </li></ul></ul><ul><li>check_http + regex = simple + effective </li></ul>
  23. 24. nagios_check_commands.cfg <ul><li>Component Failure </li></ul><ul><li>(Future) Newer Hadoop will have XML status </li></ul>define command { command_name check_remote_namenode command_line $USER1$/check_http -H $HOSTADDRESS$ -u http://$HOSTADDRESS$:$ARG1$/dfshealth.jsp -p $ARG1$ -r NameNode } define service {                service_description            check_remote_namenode                use                             generic-service                host_name                       hadoopname1                check_command               check_remote_namenode!50070 }
  24. 25. Monitoring Battle Plan <ul><li>Start With the Basics </li></ul><ul><ul><li>Ping, Disk (Done) </li></ul></ul><ul><li>Add Hadoop Specific Alarms </li></ul><ul><ul><li>check_data_node (Done) </li></ul></ul><ul><li>Add JMX Graphing </li></ul><ul><ul><li>NameNodeOperations </li></ul></ul><ul><li>Add JMX Based alarms </li></ul><ul><ul><li>FilesTotal > 1,000,000 or LiveNodes < 50% </li></ul></ul>
  25. 26. JMX Graphing <ul><li>Enable JMX </li></ul><ul><li>Import Templates </li></ul>
  26. 27. JMX Graphing
  27. 28. JMX Graphing
  28. 29. JMX Graphing
  29. 31. Standard Java JMX
  30. 32. Monitoring Battle Plan Thus Far <ul><li>Start With the Basics !!!!!!Done!!!!! </li></ul><ul><ul><li>Ping, Disk </li></ul></ul><ul><li>Add Hadoop Specific Alarms !Done! </li></ul><ul><ul><li>check_data_node </li></ul></ul><ul><li>Add JMX Graphing !Done! </li></ul><ul><ul><li>NameNodeOperations </li></ul></ul><ul><li>Add JMX Based alarms </li></ul><ul><ul><li>FilesTotal > 1,000,000 or LiveNodes < 50% </li></ul></ul>
  31. 33. Add JMX based Alarms <ul><li>hadoop-cacti-jtg is flexible </li></ul><ul><ul><li>extend fetch classes </li></ul></ul><ul><ul><li>Don’t call output() </li></ul></ul><ul><ul><li>Write your own check logic </li></ul></ul>
  32. 34. Quick JMX Base Walkthrough <ul><li>url, user, pass, object specified from CLI </li></ul><ul><li>wantedVariables, wantedOperations by inheritance </li></ul><ul><li>fetch() output() provided </li></ul>
  33. 35. Extend for NameNode
  34. 36. Extend for Nagios
  35. 37. Monitoring Battle Plan <ul><li>Start With the Basics !DONE! </li></ul><ul><ul><li>Ping, Disk </li></ul></ul><ul><li>Add Hadoop Specific Alarms !DONE! </li></ul><ul><ul><li>check_data_node </li></ul></ul><ul><li>Add JMX Graphing !DONE! </li></ul><ul><ul><li>NameNodeOperations </li></ul></ul><ul><li>Add JMX Based alarms !DONE! </li></ul><ul><ul><li>FilesTotal > 1,000,000 or LiveNodes < 50% </li></ul></ul>
  36. 38. Review <ul><li>File System Growth </li></ul><ul><ul><li>Size </li></ul></ul><ul><ul><li>Number of Files </li></ul></ul><ul><ul><li>Number of Blocks </li></ul></ul><ul><ul><li>Ratio’s </li></ul></ul><ul><li>Utilization </li></ul><ul><ul><li>CPU/Memory </li></ul></ul><ul><ul><li>Disk </li></ul></ul><ul><li>Email (nightly) </li></ul><ul><ul><li>FSCK </li></ul></ul><ul><ul><li>DSFADMIN </li></ul></ul>
  37. 39. The Future <ul><li>JMX Coming to JobTracker and TaskTracker (0.21) </li></ul><ul><ul><li>Collect and Graph Jobs Running </li></ul></ul><ul><ul><li>Collect and Graph Map / Reduce per node </li></ul></ul><ul><ul><li>Profile Specific Jobs in Cacti? </li></ul></ul>
  • vmckrish

    Feb. 8, 2020
  • ShreenSri

    Aug. 12, 2019
  • AjaiOmtri

    Aug. 8, 2015
  • sudhanshusharma15

    Jul. 14, 2015
  • SarangAnajwala

    Mar. 5, 2015
  • bunkertor

    Nov. 18, 2014
  • gloryfor

    Nov. 7, 2014
  • skpabba

    Jul. 16, 2014
  • Summer0Nguyen

    May. 7, 2014
  • billytree

    Mar. 20, 2014
  • DonaldMiner

    Jan. 28, 2014
  • bepcyc

    Nov. 23, 2013
  • AnnaLahoud

    Oct. 16, 2013
  • vincentcolombo

    Oct. 8, 2013
  • LeonTang

    Dec. 18, 2012
  • shashwat2010

    Dec. 3, 2012
  • mikopp

    Oct. 9, 2012
  • code6

    Jul. 5, 2012
  • binlijin

    Aug. 28, 2011

Monitoring hadoop With Cacti and Nagios

Aufrufe

Aufrufe insgesamt

26.962

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

95

Befehle

Downloads

382

Geteilt

0

Kommentare

0

Likes

19

×