Trevor McDonald - Nagios XI Under The Hood - What happens when a check is run? What are the parts that move behind the scenes to turn a service check into a notification? In this talk, Trevor will walk through the check process from start to finish, giving an overview of the components involved at each step.
2. Who am I?
●Support Manager, Nagios Enterprises
–tmcdonald on Support Forum
–https://github.com/tmcnag
–NWC2014 - Nagduino
●Non-Work
–World languages
–Computer security
3. Intro
●Scope
–Nagios XI 5 w/ Nagios Core 4.1.1
–ndomod in use
–Bulk mode with NPCD
●Scope (but still mostly valid)
–Pre-2014 XI
–Pre-4 Core
–mod_gearman / DNX / remote agents
5. Overview
●Check is run
–Exit code & status output stashed
–Performance data split off
–Event handlers and/or notifications launched
●Perfdata processed
–Multi-step process
●Reports, Web GUI, etc.
6.
7. Check is Run
●Check hits next_check_time (status.dat)
●execvp('/path/to/plugin', *args);
●Results are reaped and passed along
●They look like this
–PING OK - Packet loss = 0%, RTA = 0.40
ms|rta=0.401000ms;400.000000;800.000000;0.000000
pl=0%;40;80;0
8. Results are Processed
●Exit Code
●Status Output
–Performance Data is included here, everything after the “|”
character
●Not much* done with these
–*That I will be covering today
9. Exit Code/Status Output
●Goes many places:
–status.dat
–retention.dat
–nagios.log if non-OK
–syslog (optional, enabled by default)
–ndo database (optional*, enabled by default)
10. Performance Data
●Split from status output after “|” character
●Handled by
–Nagios
–Cron
–NPCD
●Also goes many places:
–Flat files – XML files
–Databases – RRD files
15. Nagios
●Stores in:
–…/var/[host|service]-perfdata
●Using the form defined by:
–[host|service]_perfdata_file_template
●Then every 15 seconds (by default)
–[host|service]_perfdata_file_processing_interval
●Nagios will run:
–[host|service]_perfdata_file_processing_command
18. NPCD
●Does the real processing legwork
●Every 15 seconds by default:
–…/libexec/process_perfdata.pl
which places processed files into:
–…/share/perfdata/<hostname>/<servicedesc>.rrd
…/share/perfdata/<hostname>/<servicedesc>.xml
20. Event Handlers/
Notifications
●Standard Nagios logic takes over
●Event Handlers run on every state change
–Some only take action for certain states
●Notifications run after max_check_attempts
–XI Notification process is…
●Complicated
●Somewhat proprietary
21. Notifications
●Contacts have notification commands
–notify-[host|service]-by-email
–xi_[host|service]_notification_h
andler
●Core basically just calls sendmail with args
●XI can
–Use SMTP – Set importance
25. Notifications
●Pulls user info from db
–MySQL in XI 5
–Postgres in 2014 and older
●Formatted nicely
–Configurable
●Sent via PHPMailer
26. Reports/GUI
●Pull primarily from these files
–nagios.log
–Archived logs in …/var/archives/
–RRD files in …/share/perfdata/…
●And from many db tables, such as
–nagios_acknowledgments
–nagios_statehistory
–nagios_notifications
27.
28. Thanks To
●Scott Wilkerson
–Explaining finer points of XI, general presentation advice
●John Frickson
–Clearing up Core logic
●Amy Lohmann
–Formatting and consistency
●Jesse Olson
–Guinea pig