Mike Weber's presentation on using Nagios and High Availability.
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
2. Alternatives
Daily Image Creation for Restore (VMWare, etc.)
- lose parts of history
- create gaps in monitoring with image creation
rsync to Synchronize Servers
- requires IP address, hostname changes
- requires modification of nagios.cfg
- assumes Master will never be misconfigured
- rsync can use a lot of resources
Clustered Nagios Server
2012 2
8. High Availability: Outline of Goals
Create Master/Slave Relationship
Master Sends History to the Slave
Slave Not Check Services, Hosts or Notifications
Slave Monitors Master via Script
Slave Enables Host, Service Checks and
Notifications
Slave Disables All Checks when Master is Up
Simplicity
2012 8
11. Step #1: Clone Master to Slave
Backup Master Databases and Files
- MySQL databases
- Postgres database
Backup Files
- /usr/local/nagios
- /usr/local/nagiosxi
Install all dependencies for plugins
Enable Access from Slave on all devices
2012 11
12. Step #2: Disable Slave
Edit nagios.cfg
execute_host_checks=0
execute_service_checks=0
enable_notifications=0
Save and Restart Nagios
2012 12
13. Step #3: Enable NSCA
Master Sends History via NSCA
- edit nagios.cfg (save and restart Nagios)
obsess_over_hosts=1
obsess_over_services=1
Slave Maintains History via NSCA
- install NSCA daemon on slave
- allow connections from Master
2012 13
15. Master: Outbound Config
File Found in /usr/local/nagios/etc
send_nsca-192.168.5.211.cfg
# CONFIGURED BY NAGIOS XI
password=LMb674FcsswP
encryption_method=3
2012 15
16. Slave: NSCA Config
default: on
# description: NSCA (Nagios Service Check Acceptor)
service nsca
{
flags = REUSE
socket_type = stream
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nsca
server_args = -c /usr/local/nagios/etc/nsca.cfg --inetd
log_on_failure += USERID
disable = no
only_from = 127.0.0.1 192.168.5.211
}
2012 16
18. Step #4: Slave Monitor Master via SSH
Create SSH Keys on Slave
- push public key to master
Create authorized_hosts file on Master
Implement SSH script to check Master
- passwordless login
- set on a cron job (check every minute)
- script detects status of Master
- scripts turns on/off checks and notifications
2012 18
19. Create Key Pair
su – nagios
mkdir .ssh
cd .ssh
ssh-keygen -b 1024 -f id_dsa -t dsa -N ''
Generating public/private dsa key pair.
Your identification has been saved in id_dsa.
Your public key has been saved in id_dsa.pub.
The key fingerprint is:
61:23:17:2d:83:d8:d9:f9:87:2d:e1:6d:e6:3d:cb:5c nagios@slxi
The key's randomart image is:
+--[ DSA 1024]----+
| o +.o |
| . + =.o |
| . == = |
| + o= * |
| S *. |
| . o E|
| o+|
| + |
| |
+-----------------+
2012 19
20. Push Public Key to nagios user on Master
scp id_dsa.pub nagios@192.168.5.211:/home/nagios/.ssh/slave
This means that the nagios user must have a /home/nagios/.ssh
directory. The public key name is changed to “slave” to avoid
overwriting any keys.
On the master (as the nagios user):
cat slave >> authorized_keys
chmod 644 authorized_keys
2012 20
21. Slave: Cron Job
# /etc/cron.d/nagiosxi: crontab fragment for nagiosxi
* * * * * nagios /bin/sh /usr/local/nagios/libexec/eventhandlers/check_master.sh
2012 21
22. Slave: check_master.sh
#!/bin/bash
masterip=192.168.5.210
function disable () {
sed -i 's/execute_host_checks=1/execute_host_checks=0/' /usr/local/nagios/etc/nagios.cfg
sed -i 's/execute_service_checks=1/execute_service_checks=0/' /usr/local/nagios/etc/nagios.cfg
sed -i 's/enable_notifications=1/enable_notifications=0/' /usr/local/nagios/etc/nagios.cfg
/sbin/service nagios reload
}
function enable () {
sed -i 's/execute_host_checks=0/execute_host_checks=1/' /usr/local/nagios/etc/nagios.cfg
sed -i 's/execute_service_checks=0/execute_service_checks=1/' /usr/local/nagios/etc/nagios.cfg
sed -i 's/enable_notifications=0/enable_notifications=1/' /usr/local/nagios/etc/nagios.cfg
/sbin/service nagios reload
}
nagpid=$(ssh nagios@$masterip /etc/init.d/nagios status | grep running |wc -l)
if [ $nagpid -eq 0 ]; then
echo "Starting Checks"
enable
fi
if [ $nagpid -eq 1 ]; then
echo "Stopping Checks"
disable
fi
exit 0
2012 22
23. Assumptions: Based on Simplicity
Mature Implementation
-set up once implementation of network is primarily complete
Master Down Short Amount of Time
- slave not send history to Master on return
Master and Slave Independent of Updates
- no rsync
- guarantees integrity of one system
2012 23
29. NSCA: Version 2.9.1
Plugin Buffer is Larger
* NSCA Server Receives OK
* NSCA Sending Adds Wrong Information
Replace with Version 2.7.2 on Master
* send_nsca
* Located in /usr/local/nagios/libexec
2012 29