SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
Monitoring with Open Monitoring Distro



        TONIGHT’S OUTLINE:


         1.   Overview of Nagios
         2.   Check_MK
         3.   What is the “Open Monitoring Distribution”?
         4.   Operating a monitoring system


         Kelvin Vanderlip
         Oracle Linux systems administrator, Sunrider International, Torrance
         kelvin@vanderlip.org




Kel Vanderlip 3-1-2012 UUASC-LA                                                 1
Monitoring with Open Monitoring Distro


       A thought for the night:


      Better to remain silent and be thought a
      fool than to speak out and remove all
      doubt.

      -- Abraham Lincoln (also attr. Confucius)




Kel Vanderlip 3-1-2012 UUASC-LA                   2
Monitoring with Open Monitoring Distro




     In the beginning…




Kel Vanderlip 3-1-2012 UUASC-LA          3
Monitoring with Open Monitoring Distro




     Why do you care about monitoring?


     •You choose a job in which success depend on hard disks,
     NFS, DNS, DHCP, NIS, mgetty, Cron jobs, postfix, routing,
     FTP, swap space, fans, UPS systems, switches, CPU
     registers…




Kel Vanderlip 3-1-2012 UUASC-LA                                  4
Monitoring with Open Monitoring Distro




        So you ask you sole staffer “Is it running”

        He says “I don’t know. Can I install NetSaint?”




Kel Vanderlip 3-1-2012 UUASC-LA                           5
Monitoring with Open Monitoring Distro


 Time passes:




    “NetSaint is not affiliated with World Wide Digital Security, Inc. (WWDSI);
    Richard S. Carson and Associates, Inc;
    and the marks WEB SAINT, SAINT, SAINTWRITER, SAINTEXPRESS,
    and SAINTBASIC owned by Richard S. Carson and Associate”




Kel Vanderlip 3-1-2012 UUASC-LA                                                   6
Monitoring with Open Monitoring Distro

Meet Ethan Galstadt:




  “This website stands as as testament to a long-running Open Source project
  that began with a simple idea in my mind. I had no inkling of the future success
  that NetSaint (and later Nagios) would come by. I almost never released it to the
  OSS community, but thank goodness I did. For without the constant flow of ideas
  from NetSaint and Nagios users, the project would have died off a long time ago.
  Cheers to everyone in the community who has participated in this project at
  some point in their life. My hat is off to you...
  -Ethan Galstad: Creator, Developer, Founder of NetSaint, Nagios, and Nagios Enterprises
  -and happy participant in a wider movement”

Kel Vanderlip 3-1-2012 UUASC-LA                                                             7
Monitoring with Open Monitoring Distro


  As I said, how do you get started in the monitoring business?




Kel Vanderlip 3-1-2012 UUASC-LA                                   8
Monitoring with Open Monitoring Distro




           Your server room grows, and you are still asking yourself: “Is it still working?”


Kel Vanderlip 3-1-2012 UUASC-LA                                                            9
Monitoring with Open Monitoring Distro


   All about Nagios:


  Nagios is a scheduling engine. It is written in C. In runs on Linux. Its an RPM and a DEB.

  Input:
            Text configuration files (lots and lots!)
  Output:
            Schedule many forks to run external monitoring applications,
            some locally, some on remote servers.
  Input:
            Each called monitoring application returns status and performance information
  Output:
            status.dat, a “snapshot” text file kept up to date several times a minute
            describing the last state for each thing Nagios is checking




Kel Vanderlip 3-1-2012 UUASC-LA                                                           10
Monitoring with Open Monitoring Distro

         Status.dat is updated 3-6 times a minute:
                                                    host {
         ########################################   host_name=Compellent
         # NAGIOS STATE RETENTION FILE              modified_attributes=0
         #                                          check_command=check-mk-ping
         # THIS FILE IS AUTOMATICALLY GENERATED     check_period=24X7
         # BY NAGIOS. DO NOT MODIFY THIS FILE!      notification_period=24X7
         ########################################   event_handler=
         info {                                     has_been_checked=1
         created=1330182965                         check_execution_time=0.013
         version=3.2.3                              check_latency=0.135
         last_update_check=0                        check_type=0
         update_available=0                         current_state=0
         update_uid=1330021387                      last_state=0
         last_version=                              last_hard_state=0
         new_version=                               last_event_id=0
         }                                          current_event_id=0
         program {                                  current_problem_id=0
         modified_host_attributes=0                 last_problem_id=0
         modified_service_attributes=0              plugin_output=OK - 10.10.99.79: rta 0.785ms, lost 0%
         enable_notifications=1                     long_plugin_output=
         active_service_checks_enabled=1            performance_data=rta=0.785ms;200.000;500.000;0; pl=0%;40;80;; rtmax=1.628ms;;;; rtmin=0.426ms;;;;
         passive_service_checks_enabled=1           last_check=1330182913
         active_host_checks_enabled=1               next_check=1330182974
         passive_host_checks_enabled=1              check_options=0
         enable_event_handlers=1                    current_attempt=1
         obsess_over_services=0                     max_attempts=1
         obsess_over_hosts=0                        normal_check_interval=1.000000
         check_service_freshness=1                  retry_check_interval=1.000000
         check_host_freshness=0                     state_type=1
         enable_flap_detection=1                    last_state_change=1330021647
         enable_failure_prediction=1                last_hard_state_change=1330021647
         process_performance_data=1                 last_time_up=1330182914
         global_host_event_handler=                 last_time_down=0
         global_service_event_handler=              last_time_unreachable=0
         next_comment_id=40                         notified_on_down=0
         next_downtime_id=1                         notified_on_unreachable=0
         next_event_id=572                          last_notification=0
         next_problem_id=290                        current_notification_number=0
         next_notification_id=457                   current_notification_id=0
         }                                          notifications_enabled=1
                                                    state_history=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
                                                    }



Kel Vanderlip 3-1-2012 UUASC-LA                                                                                                              11
Monitoring with Open Monitoring Distro

         More status.dat:
         service {
                                                                                         check_options=0
         host_name=ebs-soa1
                                                                                         notified_on_unknown=0
         service_description=CPU load
                                                                                         notified_on_warning=0
         modified_attributes=0
                                                                                         notified_on_critical=0
         check_command=check_mk-cpu.loads
                                                                                         current_notification_number=0
         check_period=24X7
                                                                                         current_notification_id=0
         notification_period=24X7
                                                                                         last_notification=0
         event_handler=
                                                                                         notifications_enabled=1
         has_been_checked=1
                                                                                         active_checks_enabled=0
         check_execution_time=0.000
                                                                                         passive_checks_enabled=1
         check_latency=0.316
                                                                                         event_handler_enabled=0
         check_type=1
                                                                                         problem_has_been_acknowledged=0
         current_state=0
                                                                                         acknowledgement_type=0
         last_state=0
                                                                                         flap_detection_enabled=1
         last_hard_state=0
                                                                                         failure_prediction_enabled=1
         last_event_id=0
                                                                                         process_performance_data=1
         current_event_id=0
                                                                                         obsess_over_service=1
         current_problem_id=0
                                                                                         is_flapping=0
         last_problem_id=0
                                                                                         percent_state_change=0.00
         current_attempt=1
                                                                                         check_flapping_recovery_notification=0
         max_attempts=1
                                                                                         state_history=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
         normal_check_interval=1.000000
                                                                                         }
         retry_check_interval=1.000000
         state_type=1
         last_state_change=1330021648
         last_hard_state_change=1330021648
         last_time_ok=1330182934
         last_time_warning=0
         last_time_unknown=0
         last_time_critical=0
         plugin_output=OK - 15min Load 0.19 at 8 CPUs
         long_plugin_output=
         performance_data=load1=0.25;40;80;0; load5=0.25;40;80;0; load15=0.19;40;80;0;
         last_check=1330182934
         next_check=0




            The file goes on for about 4 more megabytes. You will never read this.

Kel Vanderlip 3-1-2012 UUASC-LA                                                                                                                    12
Monitoring with Open Monitoring Distro


   When you think of Nagios, you think of its web output:


  Nagios’ CGI is a visualization engine. It is written in C. In runs on Linux.

  Input:
            status.dat
  Output:
            web pages describing what’s in status.dat
  Input:
            Mouse clicks from the operator
  Output:
            changes to what is viewed, and changes to Nagios’ current state




Kel Vanderlip 3-1-2012 UUASC-LA                                                  13
Monitoring with Open Monitoring Distro




Kel Vanderlip 3-1-2012 UUASC-LA          14
Monitoring with Open Monitoring Distro


  There are lots of books about using Nagios. I have read most of them,
  and they all helped me out.

  A good Nagios implementation is a study in organizational behavior.

  If you run Nagios, and you find that no one else in the organization ever fixes anything
  Based on Nagios findings, stop looking at the screen and start talking to people.

  Socially, using Nagios successfully forces you to involve your co-workers. They
  Have to “buy in” to the Nagios outputs, which means they have to understand
  what it does and how it reports its findings.

  Managerially, keeping the email notification flood from Nagios under control is a
  pre-requisite if you want anyone to actually use an email as a basis for corrective action.
  Festival (a loud speaker) and SMS work great. An report based on an SQL query works
  great.

  Creating a Navy-like “Officer of the watch” worked in the U.K.

Kel Vanderlip 3-1-2012 UUASC-LA                                                          15
Monitoring with Open Monitoring Distro



    Doing Nagios means:

    Visit a server, and poke around. List what is important to check;

    For each important thing you want to check:

    Find or write some code to check it

    Set limits which your code can test to decide whether what you are
    checking is OK, or not

    Schedule the code to run over and over

    If the test is not OK, send a message to the interested party (email seems
    to be a favorite).




Kel Vanderlip 3-1-2012 UUASC-LA                                                   16
Monitoring with Open Monitoring Distro

       Find or write some code to check it

       Grab a check from Nagios libexec apps – C, Perl, Python, bash –
       and put it where it can perform the check

       Set limits so your code can decide whether it’s OK or not

       Configure command line parameters for the check where it will
       be called

       Schedule the code to run over and over

       Reconfigure Nagios’s inputs to include the check and run it,
       perhaps using a transport mechanism

       If the test is not OK, send a message to the interested party

       Nagios checks return state to Nagios, which can fork to send
       notifications

Kel Vanderlip 3-1-2012 UUASC-LA                                          17
Monitoring with Open Monitoring Distro

Nagios transport systems:
                                          ACTIVE CHECKS




                                          PASSIVE CHECKS




                                          EXPORT STATE




                                         SSH WORKS AS WELL…
Kel Vanderlip 3-1-2012 UUASC-LA                               18
Monitoring with Open Monitoring Distro

    PROBLEMS WITH THE TRADITIONAL NAGIOS APPROACH:
    How many times do you have to visit each server?
    How many times do you have to modify Nagios’s input files?
    How many times to you discover something you are not monitoring?
    Is all this worth it?




Kel Vanderlip 3-1-2012 UUASC-LA                                        19
Monitoring with Open Monitoring Distro


    Home is no better. Can you count the servers?




Kel Vanderlip 3-1-2012 UUASC-LA                     20
Monitoring with Open Monitoring Distro

  Welcome to Check_MK!




Kel Vanderlip 3-1-2012 UUASC-LA          21
Monitoring with Open Monitoring Distro



   So how about a new approach to managing Nagios?

   1. Write a shell script which check everything you can think of checking on a Linux
       box in one operation

   2. Send this script to each server once. Configure each server’s xinetd so that the
       script can be called using port 6556

   3. Remotely run this script and feed its output to a process which writes a separate
       Nagios configuration for each “service” found

   4. Schedule Nagios to run a single check once a minute: call the remote shell script
       over port 6556, and process the result in the “check” itself

   5. The check returns each individual “service” measurement it finds to Nagios by
       writing to the Nagios passive “external command file”



Kel Vanderlip 3-1-2012 UUASC-LA                                                           22
Monitoring with Open Monitoring Distro



    Write a shell script which check everything you can think of checking on a Linux
     box in one operation

   It is already written for you by M.K., for HP-UX, Linux and Windows, probably others.

    Send this script to each server once. Configure each server’s xinetd so that the
     script can be called using port 6556

   Installing the shell script, creating directories, and reconfiguring and restarting xinetd
       are done for you by the check_mk_agent.rpm or .deb




Kel Vanderlip 3-1-2012 UUASC-LA                                                             23
Monitoring with Open Monitoring Distro


  Getting the check_mk agent installed on a Linux box:



  OMD[torrance]:~$ scp /home/kelvinv/check_mk-agent-1.1.12p6-1.noarch.rpm

  root@ebsprod-is1:
  root@ebsprod-is1's password:

  OMD[torrance]:~/etc/check_mk$ ssh ebsprod-is1
  torrance@ebsprod-is1's password:

  [root@ebsprod-is1 ~]# rpm -Uhv check_mk-agent-1.1.12p6-1.noarch.rpm
  Preparing...                ###########################################
  [100%]
     1:check_mk-agent         ###########################################
  [100%]
  Activating startscript of xinetd
  Reloading xinetd...
  Reloading configuration: [ OK ]




Kel Vanderlip 3-1-2012 UUASC-LA                                             24
Monitoring with Open Monitoring Distro
      #!/bin/bash
      # +------------------------------------------------------------------+
      # |              ____ _               _        __ __ _ __             |
      # |             / ___| |__   ___ ___| | __    | / | |/ /             |
      # |           | |    | '_  / _ / __| |/ /   | |/| | ' /            |
      # |           | |___| | | | __/ (__|     <    | | | | .              |
      # |             ____|_| |_|___|___|_|____|_| |_|_|_            |
      # |                                                                   |
      # | Copyright Mathias Kettner 2010              mk@mathias-kettner.de |
      # +------------------------------------------------------------------+
      #
      # This file is part of Check_MK.
      # The official homepage is at http://mathias-kettner.de/check_mk.
      #
      # check_mk is free software; you can redistribute it and/or modify it
      # under the terms of the GNU General Public License as published by
      # the Free Software Foundation in version 2. check_mk is distributed
      # in the hope that it will be useful, but WITHOUT ANY WARRANTY; with-
      # out even the implied warranty of MERCHANTABILITY or FITNESS FOR A
      # PARTICULAR PURPOSE. See the GNU General Public License for more de-
      # ails.
      # Remove locale settings to eliminate localized outputs where possible
      export LC_ALL=C
      unset LANG

      export MK_LIBDIR="/usr/lib/check_mk_agent"
      export MK_CONFDIR="/etc/check_mk"

      # Make sure, locally installed binaries are found
      PATH=$PATH:/usr/local/bin

Kel Vanderlip 3-1-2012 UUASC-LA                                                 25
Monitoring with Open Monitoring Distro
More tests in check_mk_agent.linux:
echo   '<<<check_mk>>>'
echo   Version: 1.1.12p6
echo   AgentOS: linux
echo   PluginsDirectory: $PLUGINSDIR
echo   LocalDirectory: $LOCALDIR
echo   AgentDirectory: $MK_CONFDIR

# If we are called via xinetd, try to find only_from configuration
if [ -n "$REMOTE_HOST" ]
then
     echo -n 'OnlyFrom: '
     echo $(sed -n
'/^service[[:space:]]*check_mk/,/}/s/^[[:space:]]*only_from[[:space:]]*=[[:space:]]*(.*)/1/p'
/etc/xinetd.d/* | head -n1)
fi

# Partitionen (-P verhindert Zeilenumbruch bei langen Mountpunkten)
# Achtung: NFS-Mounts werden grundsaetzlich ausgeblendet, um
# Haenger zu vermeiden. Diese sollten ohnehin besser auf dem
# Server, als auf dem Client ueberwacht werden.
echo '<<<df>>>'
df -PTlk -x smbfs -x tmpfs -x cifs -x iso9660 -x udf -x nfsv4 | sed 1d
# VMWare shows its own filesystems with 'vdf'. Just one
# problem: it outputs not 7 but only 6 columns
if which vdf > /dev/null
then
   vdf -P | grep ^/vmfs/volumes | sed 's/ / vmfs /'
fi

Kel Vanderlip 3-1-2012 UUASC-LA                                                         26
Monitoring with Open Monitoring Distro

More tests in check_mk_agent.linux:

# Check mount options. Filesystems may switch to 'ro' in case
# of a read error.
echo '<<<mounts>>>'
grep ^/dev < /proc/mounts

# processes including username, without kernel processes
echo '<<<ps>>>'
ps ax -o user,vsz,rss,pcpu,command --columns 10000 | sed -e 1d -e 's/ *([^
]*) *([^ ]*) *([^ ]*) *([^ ]*) */(1,2,3,4) /'




Kel Vanderlip 3-1-2012 UUASC-LA                                               27
Monitoring with Open Monitoring Distro


Running check_mk_agent – “telnet <remote host> 6556” :

Connected to nagios.
Escape character is '^]'.
<<<check_mk>>>
Version: 1.1.12p6
AgentOS: linux
PluginsDirectory: /usr/lib/check_mk_agent/plugins
LocalDirectory: /usr/lib/check_mk_agent/local
AgentDirectory: /etc/check_mk
OnlyFrom:
<<<df>>>
/dev/mapper/tom--rp--debian-root ext3    9607396    3714444   5404916    41% /
/dev/xvda1    ext2      233191     30735     190015       14% /boot
/dev/xvdb1    ext4    30961664   7003764 22385140         24% /opt
<<<nfsmounts>>>
<<<mounts>>>
/dev/mapper/tom--rp--debian-root / ext3 rw,relatime,errors=remount-ro,data=ordered 0 0
/dev/xvda1 /boot ext2 rw,relatime,errors=continue 0 0
/dev/xvdb1 /opt ext4 rw,relatime,barrier=1,data=ordered 0 0
<<<ps>>>
(root,8356,808,0.0) init [2]
(root,0,0,0.0) [kthreadd]
(root,0,0,0.0) [migration/0]
(root,0,0,0.0) [ksoftirqd/0]
(root,0,0,0.0) [ksoftirqd/1]
/etc/


Kel Vanderlip 3-1-2012 UUASC-LA                                                          28
Monitoring with Open Monitoring Distro

   Remotely run this script and feed its output to a process which writes a separate
    Nagios configuration for each “service” found

  Edit /opt/omd/<site>/etc/check_mk/main.mk to add a host, or use WATO, then:

  > check_mk –I <hostname>

   Schedule Nagios to run a single check once a minute: call the remote shell script
    over port 6556, and process the result in the “check” itself

   The check returns each individual “service” measurement it finds to Nagios by
    writing to the Nagios passive “external command file”

  > check_mk –O

   Update the whole Nagios configuration for a server which has a new
    configuration

  > check_mk –II <hostname>
  > check-mk -O                                                      And loop forever…
Kel Vanderlip 3-1-2012 UUASC-LA                                                         29
Monitoring with Open Monitoring Distro




Again, to test and see what check_mk_agent will report on your server, install the RPM and
   then, locally, run

> telnet localhost 6556

To see what configuration has been created for Nagios, look at these files on the
Nagios server:

> less /opt/omd/<site>/etc/nagios/conf.d/check_mk_objects.cfg
> less /opt/omd/<site>/etc/nagios/conf.d/check_mk_templates.cfg




 Kel Vanderlip 3-1-2012 UUASC-LA                                                        30
Monitoring with Open Monitoring Distro


      Charts and graphs:

      Long-term performance history is important. SLAs, correlating things over
      time.

      Nagios keeps almost no history.

      MySQL can save history, but needs maintenance (it fills up and Nagios stalls).
      Besides, you still need to “visualize” what is going on.

      RRD is a great database service to keep temporal history. It never fills up.

      RRD includes visualization tools (graphs)

      Traditionally, it has been a job to incorporate RRD into Nagios, usually using
      3rd party packages.




Kel Vanderlip 3-1-2012 UUASC-LA                                                        31
Monitoring with Open Monitoring Distro




      “PNP is an addon to Nagios which analyzes performance data provided by
      plugins and stores them automatically into RRD-databases (Round Robin
      Databases, see RRD Tool).

      During development of PNP we set value on easy installation and little
      maintenance while running it. An administrator should do other things than
      configure graphing tools. “




Kel Vanderlip 3-1-2012 UUASC-LA                                                    32
Monitoring with Open Monitoring Distro


 Besides configuring Nagios and RRD, other things an administrator should be doing include
 Documentation.

 Wouldn’t it be great if you could move between the Nagios CGI screens, the PNP4Nagios
 Charts and a documentation Wiki?

 DokuWiki has worked for me.

 Also used in OMD for users, passwords, privileges across OMD applications (eg NagViz)


                        “DokuWiki is a standards compliant, simple to use Wiki, mainly aimed at
                        creating documentation of any kind. It is targeted at developer teams,
                        workgroups and small companies. It has a simple but powerful syntax which
                        makes sure the datafiles remain readable outside the Wiki and eases the
                        creation of structured texts. All data is stored in plain text files – no
                        database is required. “




Kel Vanderlip 3-1-2012 UUASC-LA                                                                     33
Monitoring with Open Monitoring Distro


  Here’s the punch line:




Kel Vanderlip 3-1-2012 UUASC-LA          34
Monitoring with Open Monitoring Distro


   OMD Quick introduction
   First install the package matching your operating system:

   # zypper install omd-0.50-sles11sp1-25.x86_64.rpm

   Now create a monitoring instance (OMD calls this "site"):

   # omd create UULAC

   And let's start Nagios and all other processes:

   # omd start UULAC

   Other OMD features:
   •Run several monitoring sites in parallel
   •Install and use several different versions of OMD in parallel
   •Easily update, duplicate, rename and manage sites


Kel Vanderlip 3-1-2012 UUASC-LA                                     35
Monitoring with Open Monitoring Distro

  What OMD contains, Page 1



nagios-3.2.3               The current version Nagios
     nagios-plugins        Standard external apps which take and report measurements
     Nsca                  The listener for passive checks from remote servers
     check_nrpe            The check application which calls checks on remote hosts
Shinken-0.6.99             (drop-in Nagios replacement, a whole world to explore)
Nagvis                     The management-level view of state – live maps, schematics
Pnp4nagios                 RRD and useful graphs. Compare services across hosts.
rrdtool/rrdcached
Check_MK                   God’s gift to the sysadmin
MK Livestatus              replace status.dat with a callable data provider
Multisite                  Easily add additional monitoring sites.




Kel Vanderlip 3-1-2012 UUASC-LA                                                  36
Monitoring with Open Monitoring Distro

  What OMD contains, Page 2


   Dokuwiki                   A nice no-SQL wiki linked from Check_MK’s screens
   Thruk                      A Perl CGI to view Nagios state (unexplored)
   Mod-Gearman                Process queue manager, reduces Nagios fork load
   check_logfiles             Locally read log files and report to Nagios
   check_oracle_health        Locally perform several Oracle DB checks
   check_mysql_health         Locally perform several Oracle DB checks
   Jmx4perl                   (unexplored)
   check_webinject            wget-like web site checker, easy to use from Nagios
   check_multi                The all singing, all dancing, Python-writing Nagios check




Kel Vanderlip 3-1-2012 UUASC-LA                                                      37
Monitoring with Open Monitoring Distro

  The Check_MK dashboard (actually called “Multisite”):




Kel Vanderlip 3-1-2012 UUASC-LA                           38
Monitoring with Open Monitoring Distro

                                    # KCV Dec 2011
                                    snmp_default_community = 'public'
You configure check_mk by editing
~/etc/check_mk/main.mk:             snmp_communities = [
                                            ( "SunriderR0!", ["UCS"] ),
                                            ( "SunriderR0!", ["Compellent"] ),
                                    ]

                                    monitoring_host = "nagios",

                                    ntp_default_levels = (10, 80.0, 110.0)

                                    # hosts not added in WATO
                                    all_hosts = [
                                            "copy-server|linux|dev|tcp",
                                            "ebsprod-ap1|linux|dev|tcp",
                                            "ebsprod-ap2|linux|dev|tcp",
                                            "ebsprod-db1|linux|dev|tcp",
                                            "ebsprod-db2|linux|dev|tcp",
                                            "fortunedelight|linux|dev|tcp",
                                            "ip158|linux|dev|tcp",
                                            "istore-1|linux|dev|tcp",
                                            "istore-uat|linux|dev|tcp",
                                            "landing-page|windows|ping",
                                            "pci-kickstart|dev|linux|tcp",
                                            "soa11g|linux|dev|tcp",
                                            "xbiz1-ap1|linux|dev|tcp",
                                            "xbiz3-db1|linux|dev|tcp",
                                            "xuat1-is1|linux|dev|tcp",
                                    ]

Kel Vanderlip 3-1-2012 UUASC-LA                                                  39
Monitoring with Open Monitoring Distro

 Getting anything else into Nagios’s cfg file:
  extra_nagios_conf += r"""
  define command {
      command_name    check-ping
      command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
  }
  define command{
      command_name    check_dp_pool
      command_line    $USER1$/check_dp_pool.pl -w $ARG1$ -c $ARG2$ $ARG3$
  }
  define command{
      command_name    check_by_ssh_kel
      command_line    $USER1$/check_by_ssh -H $HOSTADDRESS$ -t 20 -l kelvinv -C $ARG1$
  }
  define command{
      command_name    check_sunrider_dig
      command_line    $USER1$/check_dig -l $HOSTNAME$.sunrider.com -H 10.10.2.1
  }
  define command{
      command_name    check_gearman
      command_line    $USER1$/check_gearman -H localhost
  }
  define command{
      command_name    check_http
      command_line    $USER1$/check_http -H $HOSTADDRESS$ -s $ARG1$
  }
  ""“



Kel Vanderlip 3-1-2012 UUASC-LA                                                          40
Monitoring with Open Monitoring Distro


  Modifying what check_mk –II writes into the nagios .cfg files:


      extra_service_conf["notification_options"] = [
              ( "n", ALL_HOSTS, ["NTP Time"] ),
              ( "n", ALL_HOSTS, ["CUPS Queue.*"] ),
              ( "c,r", ALL_HOSTS, ["Ping"] ),
              ( "n", ALL_HOSTS, ["lpstat_queue"] ),
              ( "n", ALL_HOSTS, ["Gearman"] ),
      ]

      extra_service_conf["normal_check_interval"] = [
        ( "2", ["hp"], ALL_HOSTS, ["Check_MK"] ),
        ( "5", ["db"], ALL_HOSTS, ["ASM disk"] ),
        ( "3", ALL_HOSTS, ["YP client"] ),
      ]




Kel Vanderlip 3-1-2012 UUASC-LA                                    41
Monitoring with Open Monitoring Distro


  Add active checks (not run by check_mk_agent) into the Nagios cfg file:



legacy_checks = [

( ( "check-ping!500,10%!1000,20%", "Ping", True), [ "tcp" ], ALL_HOSTS ),

( ( "check_sunrider_dig", "DNS Entry", True), [ "tcp" ], ALL_HOSTS ),

( ( "check_dp_pool!600000!300000!VAULT", "pool_VAULT", True), [ "sunuxdp" ] ),

( ( "check_gearman", "Gearman", True), [ "nagios"   ] ),

( ( "check_http!'ibeCCtdMinisites.jsp?language=US'",   "iStore-1 web", True), [ "istore-1" ]
),

( ( "check_by_ssh_kel!'/opt/nrpe/libexec/check_ypwhich.sh'","YP client", True), [
"itauxap1", "itauxap2" ] ),




Kel Vanderlip 3-1-2012 UUASC-LA                                                          42
Monitoring with Open Monitoring Distro


  host_groups = [
          ( "Production",         [   "prod" ],       ALL_HOSTS   ),
          ( "Test",               [   "test" ],       ALL_HOSTS   ),
          ( "Development",        [   "dev" ],        ALL_HOSTS   ),
          ( "Production PCI",     [   "prod_pci" ],   ALL_HOSTS   ),
          ( "Business Analyst",   [   "ba" ],         ALL_HOSTS   ),
          ( "Backup",             [   "backup" ],     ALL_HOSTS   ),
          ( "Database",           [   "db" ],         ALL_HOSTS   ),
          ( "Application",        [   "ap" ],         ALL_HOSTS   ),
          ( "Storage",            [   "store" ],      ALL_HOSTS   ),
          ( "Monitors",           [   "mon" ],        ALL_HOSTS   ),
          ( "Infrastructure",     [   "infra" ],      ALL_HOSTS   ),
          ( "Networking",         [   "net" ],        ALL_HOSTS   ),
          ( "Physical",           [   "phy" ],        ALL_HOSTS   ),
          ( "VMware",             [   "vmware" ],     ALL_HOSTS   ),
          ( "Xen",                [   "xen" ],        ALL_HOSTS   ),
          ( "Oracle VM",          [   "oravm" ],      ALL_HOSTS   ),
          ( "HP-UX",              [   "hp" ],         ALL_HOSTS   ),
          ( "Linux",              [   "linux" ],      ALL_HOSTS   ),
          ( "otheros",            [   "Other OS" ],   ALL_HOSTS   ),
          ( "priv",               [   "Private" ],    ALL_HOSTS   ),
  ]




Kel Vanderlip 3-1-2012 UUASC-LA                                        43
Monitoring with Open Monitoring Distro


  Tell check_mk to ignore data returned by check_mk_agent:



  ignored_services = [


  ( [ "eprodap1", "eprodap2", "eproddb1", "eproddb2", "ebspatch-ap2", "ebstest-ap1",
  "ebstest-db1", "ip94", "ip88pci" ], [ "IPMI Sensor Fan_Fan_[0-9]$"] ),


  ( [ "eprodap1", "eprodap2", "eproddb1", "eproddb2", "ebspatch-ap2", "ebstest-ap1",
  "ebstest-db1", "ip94", "ip88pci" ], [ "IPMI Sensor Power_Unit_VRM_[0-9]$"] ),


  ( [ "itauxap2" ],      [ "Logical Volume /dev/vg00/lvol2$" ] ),


  ( [ "itauxdev"],       [ "asm_procs$" ], ),

  ]




Kel Vanderlip 3-1-2012 UUASC-LA                                                        44
Monitoring with Open Monitoring Distro

  Tell check_mk to use these warn and critical parameters. Used by check_mk,
  results passed into Nagios as passive checks. Not in Nagios cfg files!
check_parameters = [

( (90, 95), [ "copy-server" ],      [ "fs_/u99$" ] ),

( (90, 95), [ "ebspatch-ap2" ], [ "fs_/u01$" ] ),

( (96, 98), [ "ebspatch-ap2" ], [ "fs_/u01/oracle/EBSTEST/db/apps_st/data$" ] ),

( (92, 96), [ "ebstest-ap1" ],      [ "fs_/u01$" ] ),

( (93, 95), [ "ebs-ap1", "ebs-db1", "ebs-ap2", "ebs-db2“ ], [ "fs_/u01$" ] ),

( (75, 150), [ "ebs-db3" ],          [ "ORA EBSAP31 Sessions$" ] ),

( (90, 90), [ "eprodap1" ],         [ "fs_/u01$" ] ),

( (85, 90), [ "eprodap1", "eprodap2" ], [ "fs_/home$" ] ),

( (85, 90), [ "eprodap1", "eprodap2", "eproddb1“ ], [ "fs_/u01/storage$" ] ),
]



  Kel Vanderlip 3-1-2012 UUASC-LA                                               45
Monitoring with Open Monitoring Distro

Copies of the check_mk_agent scripts are installed here:

This demonstrates how OMD keeps its versions nicely separated.


OMD[torrance]:~/etc/check_mk$ locate check_mk_agent
/opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.aix
/opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.freebsd
/opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.hpux
/opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.linux
/opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.solaris
/opt/omd/versions/0.50/share/check_mk/agents/windows/check_mk_agent.cc
/opt/omd/versions/0.50/share/check_mk/agents/windows/check_mk_agent.exe
/opt/omd/versions/0.50/share/doc/check_mk/treasures/check_mk_agent.hp
/opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.aix
/opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.freebsd
/opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.hpux
/opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.linux
/opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.solaris
/opt/omd/versions/0.52/share/check_mk/agents/windows/check_mk_agent.cc
/opt/omd/versions/0.52/share/check_mk/agents/windows/check_mk_agent.exe




Kel Vanderlip 3-1-2012 UUASC-LA                                           46
Monitoring with Open Monitoring Distro

 Example of adding a custom check run on a remote host using ssh:
In main.mk:
legacy_checks = [
  ( ( "check_by_ssh_kel!'/opt/nrpe/libexec/check_ypwhich.sh'","YP client",
True), [ "itauxap1", "itauxap2" ] ),
]

define command{
    command_name     check_by_ssh_kel
    command_line     $USER1$/check_by_ssh -H $HOSTADDRESS$ -t 20 -l kelvinv -C $ARG1$
}

On itauxap1 at /opt/nrpe/libexec/check_ypwhich.sh
#!/bin/sh
SERVER=`ypwhich`
if [ -z $SERVER ]
then
           echo "CRITICAL: ypwhich NULL"
           exit 2
fi
if [ $SERVER != "itauxap1.sunrider.com" ]
then
           echo "CRITICAL: ypwhich INCORRECT: $SERVER"
           exit 2
fi
echo ypwhich OK: $SERVER
exit 0


Kel Vanderlip 3-1-2012 UUASC-LA                                                         47
Monitoring with Open Monitoring Distro



     Another custom check, returns count of printer queue depth on
     HP-UX.

     KISS!

             #more check_lpstat-o.sh
             #!/bin/sh
             COUNT=`lpstat -o | wc -l`
             echo lpstat queue $COUNT | queue=$COUNT
             exit 0




Kel Vanderlip 3-1-2012 UUASC-LA                                      48
Monitoring with Open Monitoring Distro


  livestatus.o:

  It is a replacement output method for Nagios’s status.dat

  Like NDO, uses Nagios Event Broken API and loads as a module into Nagios.
  Unlike NDO, does not write; just responds to queries

  Used by Check_MK_Multisite, NagVis, Thruk to populate CGIs with data.

  Of course, its automatically set up in OMD…


   In nagios.cfg:
   broker_module=/usr/local/lib/mk-livestatus/livestatus.o 
   /var/run/nagios/rw/live



   Anyone using “Livestatus Query Language”?


Kel Vanderlip 3-1-2012 UUASC-LA                                               49
Monitoring with Open Monitoring Distro


  livescheck:

  It is a replacement output method for Nagios’s heavy fork()

  Lightweight (100k RAM) helper process, called by Nagios to execute external
  applications.

  New, only in latest distro, I have not used it yet.


  In nagios.cfg:
  broker_module=...../livestatus.o livecheck=/omd/sites/mysite/lib/ 
     mk-livestatus/livecheck




  OMD includes the gearmand helper process, works for me.



Kel Vanderlip 3-1-2012 UUASC-LA                                                 50
Monitoring with Open Monitoring Distro

  MultiSite:

  It’s the CGI created by Mithias - “Check_MK” – it is what you see in your browser.

  “Multisite allows each user to customize the builtin views or create completely new
  views. This is done in the GUI by flexibly combining datasources, layouts, filters, sortings,
  groupings, column-painters and inter-view-links. The idea behind is, that the
  administrators of the monitoring system should be able to create custom views for their
  users or customers, while those are presented a GUI as simple as possible.”

  Reads data from livestatus.o, so refresh is almost instant – triggered by Nagios events, I
  think.

  Allows “multi-site” Nagios monitoring to be trivially easy:
  •Set up more than one site using OMD (local or distributed)
  •Edit “multisite.mk”
  •Watch the world – I have Hong Kong, they have me.

  Includes a configurable sidebar, I have not been there yet.

Kel Vanderlip 3-1-2012 UUASC-LA                                                             51
Monitoring with Open Monitoring Distro



       Check_MK things we might not have touched on tonight:

       Python            I have not had to look at it yet!

       WATO              GUI management for Multisite

       Application-level monitoring           Aggregation, BI services

       Logwatch          grep your favorite logs and read from the GUI

       Windows           “check_mk_agent.exe install”

       NagViz            The management view – pays your bills!

       Mailing lists     sign up, they are active.



Kel Vanderlip 3-1-2012 UUASC-LA                                          52
Monitoring with Open Monitoring Distro




       I’m out of slides!

       Questions?


       Kelvin Vanderlip
       kelvin@vanderlip.org




Kel Vanderlip 3-1-2012 UUASC-LA          53

Weitere ähnliche Inhalte

Ähnlich wie Computer monitoring with the Open Monitoring Distribution

Managing MariaDB Server operations with Percona Toolkit
Managing MariaDB Server operations with Percona ToolkitManaging MariaDB Server operations with Percona Toolkit
Managing MariaDB Server operations with Percona ToolkitSveta Smirnova
 
ProxySQL in the Cloud
ProxySQL in the CloudProxySQL in the Cloud
ProxySQL in the CloudRené Cannaò
 
Kernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyKernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyAnne Nicolas
 
OpenStack networking
OpenStack networkingOpenStack networking
OpenStack networkingSim Janghoon
 
Nagios Conference 2014 - Jim Prins - Passive Monitoring with Nagios
Nagios Conference 2014 - Jim Prins - Passive Monitoring with NagiosNagios Conference 2014 - Jim Prins - Passive Monitoring with Nagios
Nagios Conference 2014 - Jim Prins - Passive Monitoring with NagiosNagios
 
Bosun Monitoring Talk at LISA14
Bosun Monitoring Talk at LISA14Bosun Monitoring Talk at LISA14
Bosun Monitoring Talk at LISA14Kyle Brandt
 
The Ring programming language version 1.7 book - Part 52 of 196
The Ring programming language version 1.7 book - Part 52 of 196The Ring programming language version 1.7 book - Part 52 of 196
The Ring programming language version 1.7 book - Part 52 of 196Mahmoud Samir Fayed
 
E node b_useful_commands_for_rf_engineer
E node b_useful_commands_for_rf_engineerE node b_useful_commands_for_rf_engineer
E node b_useful_commands_for_rf_engineerMohamed Msuya
 
OSMC 2010 | NSClient++ - what's new? And what's coming! by Michael Medin
OSMC 2010 |  NSClient++ - what's new? And what's coming! by Michael MedinOSMC 2010 |  NSClient++ - what's new? And what's coming! by Michael Medin
OSMC 2010 | NSClient++ - what's new? And what's coming! by Michael MedinNETWAYS
 
Using ngx_lua in upyun 2
Using ngx_lua in upyun 2Using ngx_lua in upyun 2
Using ngx_lua in upyun 2OpenRestyCon
 
Using ngx_lua in UPYUN 2
Using ngx_lua in UPYUN 2Using ngx_lua in UPYUN 2
Using ngx_lua in UPYUN 2Cong Zhang
 
Nagios Conference 2014 - Rodrigo Faria - Developing your Plugin
Nagios Conference 2014 - Rodrigo Faria - Developing your PluginNagios Conference 2014 - Rodrigo Faria - Developing your Plugin
Nagios Conference 2014 - Rodrigo Faria - Developing your PluginNagios
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Nagios
 
FOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with GaleraFOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with GaleraFromDual GmbH
 
System Updates with Ansible - Ansible Brno #1 - Vincent van Scherpenseel
System Updates with Ansible - Ansible Brno #1 - Vincent van ScherpenseelSystem Updates with Ansible - Ansible Brno #1 - Vincent van Scherpenseel
System Updates with Ansible - Ansible Brno #1 - Vincent van Scherpenseelansiblebrno
 
Ajuste (tuning) del rendimiento de SQL Server 2008
Ajuste (tuning) del rendimiento de SQL Server 2008Ajuste (tuning) del rendimiento de SQL Server 2008
Ajuste (tuning) del rendimiento de SQL Server 2008Eduardo Castro
 
SQL Server Performance Analysis
SQL Server Performance AnalysisSQL Server Performance Analysis
SQL Server Performance AnalysisEduardo Castro
 

Ähnlich wie Computer monitoring with the Open Monitoring Distribution (20)

Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
 
Managing MariaDB Server operations with Percona Toolkit
Managing MariaDB Server operations with Percona ToolkitManaging MariaDB Server operations with Percona Toolkit
Managing MariaDB Server operations with Percona Toolkit
 
ProxySQL in the Cloud
ProxySQL in the CloudProxySQL in the Cloud
ProxySQL in the Cloud
 
Kernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyKernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easy
 
OpenStack networking
OpenStack networkingOpenStack networking
OpenStack networking
 
Nagios Conference 2014 - Jim Prins - Passive Monitoring with Nagios
Nagios Conference 2014 - Jim Prins - Passive Monitoring with NagiosNagios Conference 2014 - Jim Prins - Passive Monitoring with Nagios
Nagios Conference 2014 - Jim Prins - Passive Monitoring with Nagios
 
Bosun Monitoring Talk at LISA14
Bosun Monitoring Talk at LISA14Bosun Monitoring Talk at LISA14
Bosun Monitoring Talk at LISA14
 
The Ring programming language version 1.7 book - Part 52 of 196
The Ring programming language version 1.7 book - Part 52 of 196The Ring programming language version 1.7 book - Part 52 of 196
The Ring programming language version 1.7 book - Part 52 of 196
 
E node b_useful_commands_for_rf_engineer
E node b_useful_commands_for_rf_engineerE node b_useful_commands_for_rf_engineer
E node b_useful_commands_for_rf_engineer
 
Week6
Week6Week6
Week6
 
OSMC 2010 | NSClient++ - what's new? And what's coming! by Michael Medin
OSMC 2010 |  NSClient++ - what's new? And what's coming! by Michael MedinOSMC 2010 |  NSClient++ - what's new? And what's coming! by Michael Medin
OSMC 2010 | NSClient++ - what's new? And what's coming! by Michael Medin
 
Using ngx_lua in upyun 2
Using ngx_lua in upyun 2Using ngx_lua in upyun 2
Using ngx_lua in upyun 2
 
Using ngx_lua in UPYUN 2
Using ngx_lua in UPYUN 2Using ngx_lua in UPYUN 2
Using ngx_lua in UPYUN 2
 
Nagios Conference 2014 - Rodrigo Faria - Developing your Plugin
Nagios Conference 2014 - Rodrigo Faria - Developing your PluginNagios Conference 2014 - Rodrigo Faria - Developing your Plugin
Nagios Conference 2014 - Rodrigo Faria - Developing your Plugin
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
 
Puppet evolutions
Puppet evolutionsPuppet evolutions
Puppet evolutions
 
FOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with GaleraFOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with Galera
 
System Updates with Ansible - Ansible Brno #1 - Vincent van Scherpenseel
System Updates with Ansible - Ansible Brno #1 - Vincent van ScherpenseelSystem Updates with Ansible - Ansible Brno #1 - Vincent van Scherpenseel
System Updates with Ansible - Ansible Brno #1 - Vincent van Scherpenseel
 
Ajuste (tuning) del rendimiento de SQL Server 2008
Ajuste (tuning) del rendimiento de SQL Server 2008Ajuste (tuning) del rendimiento de SQL Server 2008
Ajuste (tuning) del rendimiento de SQL Server 2008
 
SQL Server Performance Analysis
SQL Server Performance AnalysisSQL Server Performance Analysis
SQL Server Performance Analysis
 

Kürzlich hochgeladen

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Computer monitoring with the Open Monitoring Distribution

  • 1. Monitoring with Open Monitoring Distro TONIGHT’S OUTLINE: 1. Overview of Nagios 2. Check_MK 3. What is the “Open Monitoring Distribution”? 4. Operating a monitoring system Kelvin Vanderlip Oracle Linux systems administrator, Sunrider International, Torrance kelvin@vanderlip.org Kel Vanderlip 3-1-2012 UUASC-LA 1
  • 2. Monitoring with Open Monitoring Distro A thought for the night: Better to remain silent and be thought a fool than to speak out and remove all doubt. -- Abraham Lincoln (also attr. Confucius) Kel Vanderlip 3-1-2012 UUASC-LA 2
  • 3. Monitoring with Open Monitoring Distro In the beginning… Kel Vanderlip 3-1-2012 UUASC-LA 3
  • 4. Monitoring with Open Monitoring Distro Why do you care about monitoring? •You choose a job in which success depend on hard disks, NFS, DNS, DHCP, NIS, mgetty, Cron jobs, postfix, routing, FTP, swap space, fans, UPS systems, switches, CPU registers… Kel Vanderlip 3-1-2012 UUASC-LA 4
  • 5. Monitoring with Open Monitoring Distro So you ask you sole staffer “Is it running” He says “I don’t know. Can I install NetSaint?” Kel Vanderlip 3-1-2012 UUASC-LA 5
  • 6. Monitoring with Open Monitoring Distro Time passes: “NetSaint is not affiliated with World Wide Digital Security, Inc. (WWDSI); Richard S. Carson and Associates, Inc; and the marks WEB SAINT, SAINT, SAINTWRITER, SAINTEXPRESS, and SAINTBASIC owned by Richard S. Carson and Associate” Kel Vanderlip 3-1-2012 UUASC-LA 6
  • 7. Monitoring with Open Monitoring Distro Meet Ethan Galstadt: “This website stands as as testament to a long-running Open Source project that began with a simple idea in my mind. I had no inkling of the future success that NetSaint (and later Nagios) would come by. I almost never released it to the OSS community, but thank goodness I did. For without the constant flow of ideas from NetSaint and Nagios users, the project would have died off a long time ago. Cheers to everyone in the community who has participated in this project at some point in their life. My hat is off to you... -Ethan Galstad: Creator, Developer, Founder of NetSaint, Nagios, and Nagios Enterprises -and happy participant in a wider movement” Kel Vanderlip 3-1-2012 UUASC-LA 7
  • 8. Monitoring with Open Monitoring Distro As I said, how do you get started in the monitoring business? Kel Vanderlip 3-1-2012 UUASC-LA 8
  • 9. Monitoring with Open Monitoring Distro Your server room grows, and you are still asking yourself: “Is it still working?” Kel Vanderlip 3-1-2012 UUASC-LA 9
  • 10. Monitoring with Open Monitoring Distro All about Nagios: Nagios is a scheduling engine. It is written in C. In runs on Linux. Its an RPM and a DEB. Input: Text configuration files (lots and lots!) Output: Schedule many forks to run external monitoring applications, some locally, some on remote servers. Input: Each called monitoring application returns status and performance information Output: status.dat, a “snapshot” text file kept up to date several times a minute describing the last state for each thing Nagios is checking Kel Vanderlip 3-1-2012 UUASC-LA 10
  • 11. Monitoring with Open Monitoring Distro Status.dat is updated 3-6 times a minute: host { ######################################## host_name=Compellent # NAGIOS STATE RETENTION FILE modified_attributes=0 # check_command=check-mk-ping # THIS FILE IS AUTOMATICALLY GENERATED check_period=24X7 # BY NAGIOS. DO NOT MODIFY THIS FILE! notification_period=24X7 ######################################## event_handler= info { has_been_checked=1 created=1330182965 check_execution_time=0.013 version=3.2.3 check_latency=0.135 last_update_check=0 check_type=0 update_available=0 current_state=0 update_uid=1330021387 last_state=0 last_version= last_hard_state=0 new_version= last_event_id=0 } current_event_id=0 program { current_problem_id=0 modified_host_attributes=0 last_problem_id=0 modified_service_attributes=0 plugin_output=OK - 10.10.99.79: rta 0.785ms, lost 0% enable_notifications=1 long_plugin_output= active_service_checks_enabled=1 performance_data=rta=0.785ms;200.000;500.000;0; pl=0%;40;80;; rtmax=1.628ms;;;; rtmin=0.426ms;;;; passive_service_checks_enabled=1 last_check=1330182913 active_host_checks_enabled=1 next_check=1330182974 passive_host_checks_enabled=1 check_options=0 enable_event_handlers=1 current_attempt=1 obsess_over_services=0 max_attempts=1 obsess_over_hosts=0 normal_check_interval=1.000000 check_service_freshness=1 retry_check_interval=1.000000 check_host_freshness=0 state_type=1 enable_flap_detection=1 last_state_change=1330021647 enable_failure_prediction=1 last_hard_state_change=1330021647 process_performance_data=1 last_time_up=1330182914 global_host_event_handler= last_time_down=0 global_service_event_handler= last_time_unreachable=0 next_comment_id=40 notified_on_down=0 next_downtime_id=1 notified_on_unreachable=0 next_event_id=572 last_notification=0 next_problem_id=290 current_notification_number=0 next_notification_id=457 current_notification_id=0 } notifications_enabled=1 state_history=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 } Kel Vanderlip 3-1-2012 UUASC-LA 11
  • 12. Monitoring with Open Monitoring Distro More status.dat: service { check_options=0 host_name=ebs-soa1 notified_on_unknown=0 service_description=CPU load notified_on_warning=0 modified_attributes=0 notified_on_critical=0 check_command=check_mk-cpu.loads current_notification_number=0 check_period=24X7 current_notification_id=0 notification_period=24X7 last_notification=0 event_handler= notifications_enabled=1 has_been_checked=1 active_checks_enabled=0 check_execution_time=0.000 passive_checks_enabled=1 check_latency=0.316 event_handler_enabled=0 check_type=1 problem_has_been_acknowledged=0 current_state=0 acknowledgement_type=0 last_state=0 flap_detection_enabled=1 last_hard_state=0 failure_prediction_enabled=1 last_event_id=0 process_performance_data=1 current_event_id=0 obsess_over_service=1 current_problem_id=0 is_flapping=0 last_problem_id=0 percent_state_change=0.00 current_attempt=1 check_flapping_recovery_notification=0 max_attempts=1 state_history=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 normal_check_interval=1.000000 } retry_check_interval=1.000000 state_type=1 last_state_change=1330021648 last_hard_state_change=1330021648 last_time_ok=1330182934 last_time_warning=0 last_time_unknown=0 last_time_critical=0 plugin_output=OK - 15min Load 0.19 at 8 CPUs long_plugin_output= performance_data=load1=0.25;40;80;0; load5=0.25;40;80;0; load15=0.19;40;80;0; last_check=1330182934 next_check=0 The file goes on for about 4 more megabytes. You will never read this. Kel Vanderlip 3-1-2012 UUASC-LA 12
  • 13. Monitoring with Open Monitoring Distro When you think of Nagios, you think of its web output: Nagios’ CGI is a visualization engine. It is written in C. In runs on Linux. Input: status.dat Output: web pages describing what’s in status.dat Input: Mouse clicks from the operator Output: changes to what is viewed, and changes to Nagios’ current state Kel Vanderlip 3-1-2012 UUASC-LA 13
  • 14. Monitoring with Open Monitoring Distro Kel Vanderlip 3-1-2012 UUASC-LA 14
  • 15. Monitoring with Open Monitoring Distro There are lots of books about using Nagios. I have read most of them, and they all helped me out. A good Nagios implementation is a study in organizational behavior. If you run Nagios, and you find that no one else in the organization ever fixes anything Based on Nagios findings, stop looking at the screen and start talking to people. Socially, using Nagios successfully forces you to involve your co-workers. They Have to “buy in” to the Nagios outputs, which means they have to understand what it does and how it reports its findings. Managerially, keeping the email notification flood from Nagios under control is a pre-requisite if you want anyone to actually use an email as a basis for corrective action. Festival (a loud speaker) and SMS work great. An report based on an SQL query works great. Creating a Navy-like “Officer of the watch” worked in the U.K. Kel Vanderlip 3-1-2012 UUASC-LA 15
  • 16. Monitoring with Open Monitoring Distro Doing Nagios means: Visit a server, and poke around. List what is important to check; For each important thing you want to check: Find or write some code to check it Set limits which your code can test to decide whether what you are checking is OK, or not Schedule the code to run over and over If the test is not OK, send a message to the interested party (email seems to be a favorite). Kel Vanderlip 3-1-2012 UUASC-LA 16
  • 17. Monitoring with Open Monitoring Distro Find or write some code to check it Grab a check from Nagios libexec apps – C, Perl, Python, bash – and put it where it can perform the check Set limits so your code can decide whether it’s OK or not Configure command line parameters for the check where it will be called Schedule the code to run over and over Reconfigure Nagios’s inputs to include the check and run it, perhaps using a transport mechanism If the test is not OK, send a message to the interested party Nagios checks return state to Nagios, which can fork to send notifications Kel Vanderlip 3-1-2012 UUASC-LA 17
  • 18. Monitoring with Open Monitoring Distro Nagios transport systems: ACTIVE CHECKS PASSIVE CHECKS EXPORT STATE SSH WORKS AS WELL… Kel Vanderlip 3-1-2012 UUASC-LA 18
  • 19. Monitoring with Open Monitoring Distro PROBLEMS WITH THE TRADITIONAL NAGIOS APPROACH: How many times do you have to visit each server? How many times do you have to modify Nagios’s input files? How many times to you discover something you are not monitoring? Is all this worth it? Kel Vanderlip 3-1-2012 UUASC-LA 19
  • 20. Monitoring with Open Monitoring Distro Home is no better. Can you count the servers? Kel Vanderlip 3-1-2012 UUASC-LA 20
  • 21. Monitoring with Open Monitoring Distro Welcome to Check_MK! Kel Vanderlip 3-1-2012 UUASC-LA 21
  • 22. Monitoring with Open Monitoring Distro So how about a new approach to managing Nagios? 1. Write a shell script which check everything you can think of checking on a Linux box in one operation 2. Send this script to each server once. Configure each server’s xinetd so that the script can be called using port 6556 3. Remotely run this script and feed its output to a process which writes a separate Nagios configuration for each “service” found 4. Schedule Nagios to run a single check once a minute: call the remote shell script over port 6556, and process the result in the “check” itself 5. The check returns each individual “service” measurement it finds to Nagios by writing to the Nagios passive “external command file” Kel Vanderlip 3-1-2012 UUASC-LA 22
  • 23. Monitoring with Open Monitoring Distro  Write a shell script which check everything you can think of checking on a Linux box in one operation It is already written for you by M.K., for HP-UX, Linux and Windows, probably others.  Send this script to each server once. Configure each server’s xinetd so that the script can be called using port 6556 Installing the shell script, creating directories, and reconfiguring and restarting xinetd are done for you by the check_mk_agent.rpm or .deb Kel Vanderlip 3-1-2012 UUASC-LA 23
  • 24. Monitoring with Open Monitoring Distro Getting the check_mk agent installed on a Linux box: OMD[torrance]:~$ scp /home/kelvinv/check_mk-agent-1.1.12p6-1.noarch.rpm root@ebsprod-is1: root@ebsprod-is1's password: OMD[torrance]:~/etc/check_mk$ ssh ebsprod-is1 torrance@ebsprod-is1's password: [root@ebsprod-is1 ~]# rpm -Uhv check_mk-agent-1.1.12p6-1.noarch.rpm Preparing... ########################################### [100%] 1:check_mk-agent ########################################### [100%] Activating startscript of xinetd Reloading xinetd... Reloading configuration: [ OK ] Kel Vanderlip 3-1-2012 UUASC-LA 24
  • 25. Monitoring with Open Monitoring Distro #!/bin/bash # +------------------------------------------------------------------+ # | ____ _ _ __ __ _ __ | # | / ___| |__ ___ ___| | __ | / | |/ / | # | | | | '_ / _ / __| |/ / | |/| | ' / | # | | |___| | | | __/ (__| < | | | | . | # | ____|_| |_|___|___|_|____|_| |_|_|_ | # | | # | Copyright Mathias Kettner 2010 mk@mathias-kettner.de | # +------------------------------------------------------------------+ # # This file is part of Check_MK. # The official homepage is at http://mathias-kettner.de/check_mk. # # check_mk is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by # the Free Software Foundation in version 2. check_mk is distributed # in the hope that it will be useful, but WITHOUT ANY WARRANTY; with- # out even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. See the GNU General Public License for more de- # ails. # Remove locale settings to eliminate localized outputs where possible export LC_ALL=C unset LANG export MK_LIBDIR="/usr/lib/check_mk_agent" export MK_CONFDIR="/etc/check_mk" # Make sure, locally installed binaries are found PATH=$PATH:/usr/local/bin Kel Vanderlip 3-1-2012 UUASC-LA 25
  • 26. Monitoring with Open Monitoring Distro More tests in check_mk_agent.linux: echo '<<<check_mk>>>' echo Version: 1.1.12p6 echo AgentOS: linux echo PluginsDirectory: $PLUGINSDIR echo LocalDirectory: $LOCALDIR echo AgentDirectory: $MK_CONFDIR # If we are called via xinetd, try to find only_from configuration if [ -n "$REMOTE_HOST" ] then echo -n 'OnlyFrom: ' echo $(sed -n '/^service[[:space:]]*check_mk/,/}/s/^[[:space:]]*only_from[[:space:]]*=[[:space:]]*(.*)/1/p' /etc/xinetd.d/* | head -n1) fi # Partitionen (-P verhindert Zeilenumbruch bei langen Mountpunkten) # Achtung: NFS-Mounts werden grundsaetzlich ausgeblendet, um # Haenger zu vermeiden. Diese sollten ohnehin besser auf dem # Server, als auf dem Client ueberwacht werden. echo '<<<df>>>' df -PTlk -x smbfs -x tmpfs -x cifs -x iso9660 -x udf -x nfsv4 | sed 1d # VMWare shows its own filesystems with 'vdf'. Just one # problem: it outputs not 7 but only 6 columns if which vdf > /dev/null then vdf -P | grep ^/vmfs/volumes | sed 's/ / vmfs /' fi Kel Vanderlip 3-1-2012 UUASC-LA 26
  • 27. Monitoring with Open Monitoring Distro More tests in check_mk_agent.linux: # Check mount options. Filesystems may switch to 'ro' in case # of a read error. echo '<<<mounts>>>' grep ^/dev < /proc/mounts # processes including username, without kernel processes echo '<<<ps>>>' ps ax -o user,vsz,rss,pcpu,command --columns 10000 | sed -e 1d -e 's/ *([^ ]*) *([^ ]*) *([^ ]*) *([^ ]*) */(1,2,3,4) /' Kel Vanderlip 3-1-2012 UUASC-LA 27
  • 28. Monitoring with Open Monitoring Distro Running check_mk_agent – “telnet <remote host> 6556” : Connected to nagios. Escape character is '^]'. <<<check_mk>>> Version: 1.1.12p6 AgentOS: linux PluginsDirectory: /usr/lib/check_mk_agent/plugins LocalDirectory: /usr/lib/check_mk_agent/local AgentDirectory: /etc/check_mk OnlyFrom: <<<df>>> /dev/mapper/tom--rp--debian-root ext3 9607396 3714444 5404916 41% / /dev/xvda1 ext2 233191 30735 190015 14% /boot /dev/xvdb1 ext4 30961664 7003764 22385140 24% /opt <<<nfsmounts>>> <<<mounts>>> /dev/mapper/tom--rp--debian-root / ext3 rw,relatime,errors=remount-ro,data=ordered 0 0 /dev/xvda1 /boot ext2 rw,relatime,errors=continue 0 0 /dev/xvdb1 /opt ext4 rw,relatime,barrier=1,data=ordered 0 0 <<<ps>>> (root,8356,808,0.0) init [2] (root,0,0,0.0) [kthreadd] (root,0,0,0.0) [migration/0] (root,0,0,0.0) [ksoftirqd/0] (root,0,0,0.0) [ksoftirqd/1] /etc/ Kel Vanderlip 3-1-2012 UUASC-LA 28
  • 29. Monitoring with Open Monitoring Distro  Remotely run this script and feed its output to a process which writes a separate Nagios configuration for each “service” found Edit /opt/omd/<site>/etc/check_mk/main.mk to add a host, or use WATO, then: > check_mk –I <hostname>  Schedule Nagios to run a single check once a minute: call the remote shell script over port 6556, and process the result in the “check” itself  The check returns each individual “service” measurement it finds to Nagios by writing to the Nagios passive “external command file” > check_mk –O  Update the whole Nagios configuration for a server which has a new configuration > check_mk –II <hostname> > check-mk -O And loop forever… Kel Vanderlip 3-1-2012 UUASC-LA 29
  • 30. Monitoring with Open Monitoring Distro Again, to test and see what check_mk_agent will report on your server, install the RPM and then, locally, run > telnet localhost 6556 To see what configuration has been created for Nagios, look at these files on the Nagios server: > less /opt/omd/<site>/etc/nagios/conf.d/check_mk_objects.cfg > less /opt/omd/<site>/etc/nagios/conf.d/check_mk_templates.cfg Kel Vanderlip 3-1-2012 UUASC-LA 30
  • 31. Monitoring with Open Monitoring Distro Charts and graphs: Long-term performance history is important. SLAs, correlating things over time. Nagios keeps almost no history. MySQL can save history, but needs maintenance (it fills up and Nagios stalls). Besides, you still need to “visualize” what is going on. RRD is a great database service to keep temporal history. It never fills up. RRD includes visualization tools (graphs) Traditionally, it has been a job to incorporate RRD into Nagios, usually using 3rd party packages. Kel Vanderlip 3-1-2012 UUASC-LA 31
  • 32. Monitoring with Open Monitoring Distro “PNP is an addon to Nagios which analyzes performance data provided by plugins and stores them automatically into RRD-databases (Round Robin Databases, see RRD Tool). During development of PNP we set value on easy installation and little maintenance while running it. An administrator should do other things than configure graphing tools. “ Kel Vanderlip 3-1-2012 UUASC-LA 32
  • 33. Monitoring with Open Monitoring Distro Besides configuring Nagios and RRD, other things an administrator should be doing include Documentation. Wouldn’t it be great if you could move between the Nagios CGI screens, the PNP4Nagios Charts and a documentation Wiki? DokuWiki has worked for me. Also used in OMD for users, passwords, privileges across OMD applications (eg NagViz) “DokuWiki is a standards compliant, simple to use Wiki, mainly aimed at creating documentation of any kind. It is targeted at developer teams, workgroups and small companies. It has a simple but powerful syntax which makes sure the datafiles remain readable outside the Wiki and eases the creation of structured texts. All data is stored in plain text files – no database is required. “ Kel Vanderlip 3-1-2012 UUASC-LA 33
  • 34. Monitoring with Open Monitoring Distro Here’s the punch line: Kel Vanderlip 3-1-2012 UUASC-LA 34
  • 35. Monitoring with Open Monitoring Distro OMD Quick introduction First install the package matching your operating system: # zypper install omd-0.50-sles11sp1-25.x86_64.rpm Now create a monitoring instance (OMD calls this "site"): # omd create UULAC And let's start Nagios and all other processes: # omd start UULAC Other OMD features: •Run several monitoring sites in parallel •Install and use several different versions of OMD in parallel •Easily update, duplicate, rename and manage sites Kel Vanderlip 3-1-2012 UUASC-LA 35
  • 36. Monitoring with Open Monitoring Distro What OMD contains, Page 1 nagios-3.2.3 The current version Nagios nagios-plugins Standard external apps which take and report measurements Nsca The listener for passive checks from remote servers check_nrpe The check application which calls checks on remote hosts Shinken-0.6.99 (drop-in Nagios replacement, a whole world to explore) Nagvis The management-level view of state – live maps, schematics Pnp4nagios RRD and useful graphs. Compare services across hosts. rrdtool/rrdcached Check_MK God’s gift to the sysadmin MK Livestatus replace status.dat with a callable data provider Multisite Easily add additional monitoring sites. Kel Vanderlip 3-1-2012 UUASC-LA 36
  • 37. Monitoring with Open Monitoring Distro What OMD contains, Page 2 Dokuwiki A nice no-SQL wiki linked from Check_MK’s screens Thruk A Perl CGI to view Nagios state (unexplored) Mod-Gearman Process queue manager, reduces Nagios fork load check_logfiles Locally read log files and report to Nagios check_oracle_health Locally perform several Oracle DB checks check_mysql_health Locally perform several Oracle DB checks Jmx4perl (unexplored) check_webinject wget-like web site checker, easy to use from Nagios check_multi The all singing, all dancing, Python-writing Nagios check Kel Vanderlip 3-1-2012 UUASC-LA 37
  • 38. Monitoring with Open Monitoring Distro The Check_MK dashboard (actually called “Multisite”): Kel Vanderlip 3-1-2012 UUASC-LA 38
  • 39. Monitoring with Open Monitoring Distro # KCV Dec 2011 snmp_default_community = 'public' You configure check_mk by editing ~/etc/check_mk/main.mk: snmp_communities = [ ( "SunriderR0!", ["UCS"] ), ( "SunriderR0!", ["Compellent"] ), ] monitoring_host = "nagios", ntp_default_levels = (10, 80.0, 110.0) # hosts not added in WATO all_hosts = [ "copy-server|linux|dev|tcp", "ebsprod-ap1|linux|dev|tcp", "ebsprod-ap2|linux|dev|tcp", "ebsprod-db1|linux|dev|tcp", "ebsprod-db2|linux|dev|tcp", "fortunedelight|linux|dev|tcp", "ip158|linux|dev|tcp", "istore-1|linux|dev|tcp", "istore-uat|linux|dev|tcp", "landing-page|windows|ping", "pci-kickstart|dev|linux|tcp", "soa11g|linux|dev|tcp", "xbiz1-ap1|linux|dev|tcp", "xbiz3-db1|linux|dev|tcp", "xuat1-is1|linux|dev|tcp", ] Kel Vanderlip 3-1-2012 UUASC-LA 39
  • 40. Monitoring with Open Monitoring Distro Getting anything else into Nagios’s cfg file: extra_nagios_conf += r""" define command { command_name check-ping command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5 } define command{ command_name check_dp_pool command_line $USER1$/check_dp_pool.pl -w $ARG1$ -c $ARG2$ $ARG3$ } define command{ command_name check_by_ssh_kel command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -t 20 -l kelvinv -C $ARG1$ } define command{ command_name check_sunrider_dig command_line $USER1$/check_dig -l $HOSTNAME$.sunrider.com -H 10.10.2.1 } define command{ command_name check_gearman command_line $USER1$/check_gearman -H localhost } define command{ command_name check_http command_line $USER1$/check_http -H $HOSTADDRESS$ -s $ARG1$ } ""“ Kel Vanderlip 3-1-2012 UUASC-LA 40
  • 41. Monitoring with Open Monitoring Distro Modifying what check_mk –II writes into the nagios .cfg files: extra_service_conf["notification_options"] = [ ( "n", ALL_HOSTS, ["NTP Time"] ), ( "n", ALL_HOSTS, ["CUPS Queue.*"] ), ( "c,r", ALL_HOSTS, ["Ping"] ), ( "n", ALL_HOSTS, ["lpstat_queue"] ), ( "n", ALL_HOSTS, ["Gearman"] ), ] extra_service_conf["normal_check_interval"] = [ ( "2", ["hp"], ALL_HOSTS, ["Check_MK"] ), ( "5", ["db"], ALL_HOSTS, ["ASM disk"] ), ( "3", ALL_HOSTS, ["YP client"] ), ] Kel Vanderlip 3-1-2012 UUASC-LA 41
  • 42. Monitoring with Open Monitoring Distro Add active checks (not run by check_mk_agent) into the Nagios cfg file: legacy_checks = [ ( ( "check-ping!500,10%!1000,20%", "Ping", True), [ "tcp" ], ALL_HOSTS ), ( ( "check_sunrider_dig", "DNS Entry", True), [ "tcp" ], ALL_HOSTS ), ( ( "check_dp_pool!600000!300000!VAULT", "pool_VAULT", True), [ "sunuxdp" ] ), ( ( "check_gearman", "Gearman", True), [ "nagios" ] ), ( ( "check_http!'ibeCCtdMinisites.jsp?language=US'", "iStore-1 web", True), [ "istore-1" ] ), ( ( "check_by_ssh_kel!'/opt/nrpe/libexec/check_ypwhich.sh'","YP client", True), [ "itauxap1", "itauxap2" ] ), Kel Vanderlip 3-1-2012 UUASC-LA 42
  • 43. Monitoring with Open Monitoring Distro host_groups = [ ( "Production", [ "prod" ], ALL_HOSTS ), ( "Test", [ "test" ], ALL_HOSTS ), ( "Development", [ "dev" ], ALL_HOSTS ), ( "Production PCI", [ "prod_pci" ], ALL_HOSTS ), ( "Business Analyst", [ "ba" ], ALL_HOSTS ), ( "Backup", [ "backup" ], ALL_HOSTS ), ( "Database", [ "db" ], ALL_HOSTS ), ( "Application", [ "ap" ], ALL_HOSTS ), ( "Storage", [ "store" ], ALL_HOSTS ), ( "Monitors", [ "mon" ], ALL_HOSTS ), ( "Infrastructure", [ "infra" ], ALL_HOSTS ), ( "Networking", [ "net" ], ALL_HOSTS ), ( "Physical", [ "phy" ], ALL_HOSTS ), ( "VMware", [ "vmware" ], ALL_HOSTS ), ( "Xen", [ "xen" ], ALL_HOSTS ), ( "Oracle VM", [ "oravm" ], ALL_HOSTS ), ( "HP-UX", [ "hp" ], ALL_HOSTS ), ( "Linux", [ "linux" ], ALL_HOSTS ), ( "otheros", [ "Other OS" ], ALL_HOSTS ), ( "priv", [ "Private" ], ALL_HOSTS ), ] Kel Vanderlip 3-1-2012 UUASC-LA 43
  • 44. Monitoring with Open Monitoring Distro Tell check_mk to ignore data returned by check_mk_agent: ignored_services = [ ( [ "eprodap1", "eprodap2", "eproddb1", "eproddb2", "ebspatch-ap2", "ebstest-ap1", "ebstest-db1", "ip94", "ip88pci" ], [ "IPMI Sensor Fan_Fan_[0-9]$"] ), ( [ "eprodap1", "eprodap2", "eproddb1", "eproddb2", "ebspatch-ap2", "ebstest-ap1", "ebstest-db1", "ip94", "ip88pci" ], [ "IPMI Sensor Power_Unit_VRM_[0-9]$"] ), ( [ "itauxap2" ], [ "Logical Volume /dev/vg00/lvol2$" ] ), ( [ "itauxdev"], [ "asm_procs$" ], ), ] Kel Vanderlip 3-1-2012 UUASC-LA 44
  • 45. Monitoring with Open Monitoring Distro Tell check_mk to use these warn and critical parameters. Used by check_mk, results passed into Nagios as passive checks. Not in Nagios cfg files! check_parameters = [ ( (90, 95), [ "copy-server" ], [ "fs_/u99$" ] ), ( (90, 95), [ "ebspatch-ap2" ], [ "fs_/u01$" ] ), ( (96, 98), [ "ebspatch-ap2" ], [ "fs_/u01/oracle/EBSTEST/db/apps_st/data$" ] ), ( (92, 96), [ "ebstest-ap1" ], [ "fs_/u01$" ] ), ( (93, 95), [ "ebs-ap1", "ebs-db1", "ebs-ap2", "ebs-db2“ ], [ "fs_/u01$" ] ), ( (75, 150), [ "ebs-db3" ], [ "ORA EBSAP31 Sessions$" ] ), ( (90, 90), [ "eprodap1" ], [ "fs_/u01$" ] ), ( (85, 90), [ "eprodap1", "eprodap2" ], [ "fs_/home$" ] ), ( (85, 90), [ "eprodap1", "eprodap2", "eproddb1“ ], [ "fs_/u01/storage$" ] ), ] Kel Vanderlip 3-1-2012 UUASC-LA 45
  • 46. Monitoring with Open Monitoring Distro Copies of the check_mk_agent scripts are installed here: This demonstrates how OMD keeps its versions nicely separated. OMD[torrance]:~/etc/check_mk$ locate check_mk_agent /opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.aix /opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.freebsd /opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.hpux /opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.linux /opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.solaris /opt/omd/versions/0.50/share/check_mk/agents/windows/check_mk_agent.cc /opt/omd/versions/0.50/share/check_mk/agents/windows/check_mk_agent.exe /opt/omd/versions/0.50/share/doc/check_mk/treasures/check_mk_agent.hp /opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.aix /opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.freebsd /opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.hpux /opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.linux /opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.solaris /opt/omd/versions/0.52/share/check_mk/agents/windows/check_mk_agent.cc /opt/omd/versions/0.52/share/check_mk/agents/windows/check_mk_agent.exe Kel Vanderlip 3-1-2012 UUASC-LA 46
  • 47. Monitoring with Open Monitoring Distro Example of adding a custom check run on a remote host using ssh: In main.mk: legacy_checks = [ ( ( "check_by_ssh_kel!'/opt/nrpe/libexec/check_ypwhich.sh'","YP client", True), [ "itauxap1", "itauxap2" ] ), ] define command{ command_name check_by_ssh_kel command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -t 20 -l kelvinv -C $ARG1$ } On itauxap1 at /opt/nrpe/libexec/check_ypwhich.sh #!/bin/sh SERVER=`ypwhich` if [ -z $SERVER ] then echo "CRITICAL: ypwhich NULL" exit 2 fi if [ $SERVER != "itauxap1.sunrider.com" ] then echo "CRITICAL: ypwhich INCORRECT: $SERVER" exit 2 fi echo ypwhich OK: $SERVER exit 0 Kel Vanderlip 3-1-2012 UUASC-LA 47
  • 48. Monitoring with Open Monitoring Distro Another custom check, returns count of printer queue depth on HP-UX. KISS! #more check_lpstat-o.sh #!/bin/sh COUNT=`lpstat -o | wc -l` echo lpstat queue $COUNT | queue=$COUNT exit 0 Kel Vanderlip 3-1-2012 UUASC-LA 48
  • 49. Monitoring with Open Monitoring Distro livestatus.o: It is a replacement output method for Nagios’s status.dat Like NDO, uses Nagios Event Broken API and loads as a module into Nagios. Unlike NDO, does not write; just responds to queries Used by Check_MK_Multisite, NagVis, Thruk to populate CGIs with data. Of course, its automatically set up in OMD… In nagios.cfg: broker_module=/usr/local/lib/mk-livestatus/livestatus.o /var/run/nagios/rw/live Anyone using “Livestatus Query Language”? Kel Vanderlip 3-1-2012 UUASC-LA 49
  • 50. Monitoring with Open Monitoring Distro livescheck: It is a replacement output method for Nagios’s heavy fork() Lightweight (100k RAM) helper process, called by Nagios to execute external applications. New, only in latest distro, I have not used it yet. In nagios.cfg: broker_module=...../livestatus.o livecheck=/omd/sites/mysite/lib/ mk-livestatus/livecheck OMD includes the gearmand helper process, works for me. Kel Vanderlip 3-1-2012 UUASC-LA 50
  • 51. Monitoring with Open Monitoring Distro MultiSite: It’s the CGI created by Mithias - “Check_MK” – it is what you see in your browser. “Multisite allows each user to customize the builtin views or create completely new views. This is done in the GUI by flexibly combining datasources, layouts, filters, sortings, groupings, column-painters and inter-view-links. The idea behind is, that the administrators of the monitoring system should be able to create custom views for their users or customers, while those are presented a GUI as simple as possible.” Reads data from livestatus.o, so refresh is almost instant – triggered by Nagios events, I think. Allows “multi-site” Nagios monitoring to be trivially easy: •Set up more than one site using OMD (local or distributed) •Edit “multisite.mk” •Watch the world – I have Hong Kong, they have me. Includes a configurable sidebar, I have not been there yet. Kel Vanderlip 3-1-2012 UUASC-LA 51
  • 52. Monitoring with Open Monitoring Distro Check_MK things we might not have touched on tonight: Python I have not had to look at it yet! WATO GUI management for Multisite Application-level monitoring Aggregation, BI services Logwatch grep your favorite logs and read from the GUI Windows “check_mk_agent.exe install” NagViz The management view – pays your bills! Mailing lists sign up, they are active. Kel Vanderlip 3-1-2012 UUASC-LA 52
  • 53. Monitoring with Open Monitoring Distro I’m out of slides! Questions? Kelvin Vanderlip kelvin@vanderlip.org Kel Vanderlip 3-1-2012 UUASC-LA 53