David Stern's presentation on The Nagios Light Bar.
The presentation was given during the Nagios World Conference North America held Sept 20-Oct 2nd, 2013 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
13. The network MUST be active before you power-on the unit
It might be a smart idea to set the light-bar for DHCP address
It would be a smart idea to monitor that the light-bar is available
14. If I can control the light-bar from the web, why not use WGET
15. The Secret Sauce; undocumented
http://light-bar/cmd.cgi?action=ST&t=A2&a=1
makes the light-bar go beep
http://light-bar/cmd.cgi?action=ST&t=A2&a=0
turns off the noise
http://light-bar/cmd.cgi?action=ST&t=GR&a=1
makes the green light go on
http://light-bar/cmd.cgi?action=ST&t=GR&a=0
turns off the green light
Substitute OR for GR to affect the orange(yellow) light
Substitute RE for GR to affect the red light
16. Getting nagios status
We can get the nagios status from the Service page:
http://nagios-server/nagios/cgi-bin/status?host=all
Just search it for “serviceTotalsPROBLEMS”
N.B. You may need to insert authentication information in URL
http://nagiosadmin:nag-password@nagios-server/nagios/cgi-
bin/status?host=all
You can use the same format for the light-bar authentication
17. Mission Creep
How about if we get the light-bar to beep the first time a new
alert occurs?
And since we don’t use nagios WARNING conditions, let’s use
the yellow light to indicate unacknowledged alerts
Let’s put it all together…
21. Configuring “hierarchical” nagios
• Backup nagios before and after these changes
• Install a Nagios server in each lab/sub-site
• Edit side.{html,php}, Add stanza for each sub-
site and set refresh=300,
• Tag each nagios sub-site: Main.{html,php} and
status.c status.cgi
• Modify light-bar cron job to check each sub-
site, swap red,green dots as needed
The light-bar is a device that visually replicates Nagios states Each light-bar represents a separate network There was an earlier project called Nampel; nagios Ampel, the German word for traffic light. This involved a great deal of hardware engineering; soldering things onto the motherboard of a dedicated machine.
Sitting in our offices, we have no way of knowing what is happening on a remote network It’s also eye-candy
Avtech.com sells a number of factory or Industrial grade sensors including thermometers, water, light and power sensors, etc. The light-bar and all sensors require a base unit. The unit pictured has ports for network and power on one side and ports for the light-bar on the other side The base unit also happens to have a built-in thermometer. The ports on the far left of the unit are for additional sensors. Avtech sells thermistors with rather long cables (50 feet). They can be plugged into these ports on the base unit. If you are going to monitor temperature, It’d be a good idea to also install nagiosgraph to see temperature trends in your data center. Data from the thermistors can be gotten by SNMP. The light-bar is meant to be controlled by SNMP We had some interesting talks with our security people convincing them it’s OK to make a hole in the wall of a closed area for the light-bar cable
Here’s an unencumbered view of the front of the base unit showing the power, network and sensor ports
Here’s a view of the back of the base unit showing where and how the light-bar plugs in The smaller ribbon-cable segment controls sound; the larger one controls the lights
The equipment includes Discovery software for both Linux and Windows You can specify a range or individual IP address The default address the device comes up on is a non-routable class C address The host you run this on must be on the same subnet as the base unit As that’s not likely to be the case when you first configure it, you need to attach the base unit to a laptop via a crossover cable And since most server-class computers have multiple NICs, you can make this your operational configuration ie permanently connect the light-bar to one of several NICs on a server via crossover cable Highlight the device to configure in this GUI and click on the Web button to go to the base unit’s home webpage
By mouse-clicking over the light segments, you can turn each light on and off Note the sound icons below the light-bar. One is for a slow stream of beeps; the other for a quicker stream of beeps. Note that sound is currently turned off Also note the built in thermometer and the greyed-out areas for two more sensors From the main STATUS page, we can click on the Settings tab
Probably the most important settings include the network information. You probably really only need to set the IP, gateway and netmask
In addition to network settings, you must choose the appropriate signal tower device (Red, yellow, green with sound or just red,green lights)
Depending on your environment or security mindset, you may wish to add authentication information
Also optional, you can set the Time Zone and temperature units Upon finishing with your settings, click “ Save Settings ”. The light-bar then automatically reboots During reboot, it tests all light segments and sound for about 20 seconds. There is a volume control on the light-bar but it cannot completely turn off sound; only deaden it.
After a power outage (planned or otherwise), the light-bar is likely to fail. The network port MUST be active before you plug in power. So just cycling power will get the light-bar and base unit back on the network As mentioned, on power-up, the light-bar turns on all lights and beeps, ideally for no more than 30 seconds. But if it can’t reach the net, this condition may continue We did static IP. At first, I thought this was a mistake. Perhaps it would auto-recover from an outage if we used DHCP. But this is really a timing issue. After an outage, the light-bar is almost certainly going to power up before the network is available anyway. Sometimes the light-bar may disappear from the ‘net. So it’d be a good idea to have a Nagios ping test to insure it’s accessible. Obviously if the test fails, it can’t notify you via the light-bar; it’s assumed at some point, you’ll get onto the closed network and look at the webpage. If the light-bar loses connectivity to the network, it will retain the last state it knew about eg just the GREEN light lit.
I already had another project in mind associated with this where WGET would be useful so I decided to control the light-bar using WGET Even though the light-bar is SUPPOSED to be controlled by SNMP. Even as a child, I could never color within the lines
Although not in the documentation, this was obtained from Avtech
The first URL is the action from clicking on the Nagios SERVICES button
Setting a flag-file will indicate if this is the first alert The alarm rings ONLY the first time a status goes red Our shop doesn’t use Nagios warnings. A condition is either ok or requires attention. A disk that’s 80% full will likely completely fill its disk soon
This script is run via cron on the nagios server every 5 minutes
Initially, we had a Nagios install only for Core services. Lab managers resisted having a Nagios server in their labs. They didn’t care if a user turned off a host. Newer security requirements however mandated that we must account for ALL time gaps in audit records. So how about if we had remote Nagios servers in each lab that somehow communicates back to the Core server? The challenge was to make the Core server aware of the other labs without throwing a red light. It would also be nice if we could zoom into the remote labs from the Core server to see the specific problem. This is a WORK IN PROGRESS so I don’t have any source code to show. We’ve been using this for several months now.
Note the lower left frame. I basically copied and tailored a stanza from above. Also added an HTML tag to refresh within 5 minutes If there is a green dot next to the lab, everything is fine. If there is a red dot, there’s an ALERT If you mouse-click on the lab in question, the right frame will zoom into that lab showing the specific problem. Make sure to backup your files before a Nagios upgrade or you’ll lose your work (side.html, side.php) This is still run by the light-bar cron job
Once you finish these steps, all that remains is deciding how to present the data. Should you zoom into the main Nagios page of each sub-site, or zoom into the service page?
Here’s what it would look like if you zoomed into the sub-site’s Main page. Clicking on anything in the rightmost frame would zoom into that item on the sub-site. Clicking on anything in the leftmost frame would zoom into the Core
Others preferred the clickable sub-site link going directly to the Service page of the sub-site. Clicking on services (or anything above the “Labs” stanza) in the leftmost frame will return you to the Core Nagios server main page. But this too was confusing. The winning combination was changing Target=“MAIN” in the side Frame to Target=“_Blank”. This opens a new tab or window into the sub-site. Operationally, you see a red dot on a sub-site on the Nagios Core page, click the link. This opens a tab to the sub-site. You look around to identify the problem then click the “ X” to the right of the tab to close that tab. And you’re back at the Nagios Core page
Sysadmins Love Nagios; it’s very extensible It makes us look like wizards A parting thought; sysadmins tend to be a cocky bunch. After all, we’re doing something new every single day But we only got to be this good because of those who came before us