SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Leveraging and Understanding
Performance Data and Graphs
Troy Lea
troy@box293.com
Twitter: @Box293
http://exchange.nagios.org/directory/Owner/Box293/1
2
About Me
IT Consultant
Nagios Developer
Love tinkering with Nagios
Why Nagios XI?
It’s a virtual appliance - ready to go
3
About This Presentation
Understanding how performance data is stored
in the back end and how Nagios accesses it
Goal is to give you key pieces of information
A good reference for understanding concepts
This presentation is centered around Nagios XI
Valid for other Nagios implementations
4
Basic Concepts - Part 1
5
Basic Concepts - Part 2
./check_nt -H SERVER -s "" -p 12489 -v USEDDISKSPACE -l C -w 80 -c 95
C: - total: 39.99 Gb - used: 25.28 Gb (63%) - free 14.71 Gb (37%) | 'C: Used
Space'=25.28Gb;32.00;38.00;0.00;39.99
6
Basic Concepts - Part 3
Service check command is executed by the monitoring engine
Monitoring engine receives the result of the check
Data received has performance data
Performance data is anything after the | (pipe)
The performance data is inserted into an RRD file
When viewing the performance graph, PNP4Nagios retrieves the
performance data from the RRD file and generates a pretty graph
Every time the service check receives performance data, it inserts
this performance data into the RRD file which allows you to look at
trends over time
7
Plugins
The power of Nagios is in the plugins!
Monitor what you want, how you want!
Resources available that clearly define the
guidelines around creating plugins
Nagios Plug-in Developer Guidelines
http://nagiosplug.sourceforge.net/developer-
guidelines.html
PNP Documentation
http://docs.pnp4nagios.org/pnp-0.4/doc_complete
8
Plugin Output Explained - Part 1
Plugins produce data divided into two parts
The pipe symbol “|” is used as a delimiter
Example check_icmp
OK - 127.0.0.1: rta 2.687ms, lost 0% |
rta=2.687ms;3000.000;5000.000;0; pl=0%;80;100;;
Data to the left of the pipe symbol is processed
by the monitoring engine
Data to the right of the pipe symbol is used for
inserting into RRD and XML files
9
Plugin Output Explained - Part 2
The exit code Nagios receives from the plugin
determines the state of the service
0 = OK
1 = WARNING
2 = CRITICAL
3 = UNKNOWN
The exit code is not “visible” when running a
check from the command line or looking at the
output returned from the plugin
10
Plugin Output Explained - Part 3
No performance data = no pretty graphs
You can create a plugin using whatever
language and tools are available
All that matters is the end result which is
returned back to Nagios when the plugin has
finished running
11
Plugin Output Explained - Part 4
Examples:
Shell script
Something you might want to check on the Nagios
host itself
perl script
Remotely checking a device using SNMP OR using
third party APIs like the VMware vSphere SDK to
remotely access virtual environments
Visual Basic script
Using NSClient on a Windows host to perform a
check (like RDP usage)
12
Performance Data Specifics - Part 1
Asterix (*) fields are required fields, everything
else is optional
In this instance, rta is the FIRST DS, or DS 1
13
Performance Data Specifics - Part 2
Multiple DS
Each DS is separated by a space
rta=2.687ms;3000.000;5000.000;0; pl=0%;80;100;;
The label can have spaces however the label
MUST be enclosed by single quotes
'Round Trip Average'=2.687ms;3000.000;5000.000;0;
'Packet Loss'=0%;80;100;;
13
14
Basic Plugin - Part 1
Example shell script demonstrating how a plugin
outputs performance data
NUMBER1=$[ ( $RANDOM % 100 ) + 1 ]
NUMBER2=$[ ( $RANDOM % 1000 ) + 1 ]
echo ""OK - Number 1: $NUMBER1 Number 2:
$NUMBER2" | 'Number 1'=$NUMBER1;;;; 'Number
2'=$NUMBER2;;;;“
exit "0"
15
Basic Plugin - Part 2
Here is the output each time it is run:
OK - Number 1: 4 Number 2: 74 | 'Number 1'=4;;;; 'Number 2'=74;;;;
OK - Number 1: 52 Number 2: 758 | 'Number 1'=52;;;; 'Number 2'=758;;;;
OK - Number 1: 73 Number 2: 60 | 'Number 1'=73;;;; 'Number 2'=60;;;;
OK - Number 1: 29 Number 2: 338 | 'Number 1'=29;;;; 'Number 2'=338;;;;
OK - Number 1: 87 Number 2: 612 | 'Number 1'=87;;;; 'Number 2'=612;;;;
16
Basic Plugin - Part 3
Performance data
displayed as a
pretty graph
Demonstration of
how you can
generate
performance data
in a plugin
17
Basic Plugin - Part 4
Now lets add warning and critical thresholds to
the performance data string
Number1
WARNING @ 50
CRITICAL @ 75
Number2
WARNING @ 500
CRITICAL @ 750
echo ""OK - Number 1: $NUMBER1 Number 2:
$NUMBER2" | 'Number 1'=$NUMBER1;50;75;;
'Number 2'=$NUMBER2;500;750;;"
18
Basic Plugin - Part 5
Here is the output each time it is run:
OK - Number 1: 4 Number 2: 74 | 'Number 1'=4;50;75;; 'Number 2'=74;500;750;;
OK - Number 1: 52 Number 2: 758 | 'Number 1'=52;50;75;; 'Number 2'=758;500;750;;
OK - Number 1: 73 Number 2: 60 | 'Number 1'=73;50;75;; 'Number 2'=60;500;750;;
OK - Number 1: 29 Number 2: 338 | 'Number 1'=29;50;75;; 'Number 2'=338;500;750;;
OK - Number 1: 87 Number 2: 612 | 'Number 1'=87;50;75;; 'Number 2'=612;500;750;;
19
Basic Plugin - Part 6
This demonstrates
how the
performance data
does not have any
effect on the state
of the service
Warning and
Critical thresholds
are inside the .xml
file
19
20
.rrd and .xml files
Used for recording the results from Nagios checks
Useful for observing daily trends of your environment
Invaluable for helping resolve performance issues
RRD = Round Robin Database
XML = Information about the Nagios check
PNP4Nagios uses the RRD and XML files to
generate pretty graphs
21
Location of .rrd and .xml files
When a service check returns performance data,
Nagios dumps this into:
/usr/local/nagios/var/spool/perfdata
A background process detects the spooled data
and creates / updates the relevant .rrd and .xml
The Performance Data files live in:
/usr/local/nagios/share/perfdata/<host>
22
Extract .rrd data
You can extract data from an .rrd file
Example (from the CLI):
rrdtool fetch
/usr/local/nagios/share/perfdata/localhost/_HOST_.rrd MAX
-r 900 -s -1h
23
.rrd and .xml Gotchya - Part 1
The .xml file can contain sensitive data
<NAGIOS_SERVICECHECKCOMMAND>check_emc_clariion!$HOSTADDRESS$!-u
readonly!-p Str0ngPassw0rd!-t sp_cbt_busy!--sp A!--warn 70!--crit 90!
</NAGIOS_SERVICECHECKCOMMAND>
24
.rrd and .xml Gotchya - Part 2
Perhaps use a central credential file
<NAGIOS_SERVICECHECKCOMMAND>check_vmware_host!
check_vmware_config_vcenter01!cpu!90!95!!!!
</NAGIOS_SERVICECHECKCOMMAND>
25
.rrd and .xml Gotchya - Part 3
RRD Data is averaged out over time
Looking at performance graphs for past day / week /
month / year will show results with less spikey data
This generally only occurs with data that has lots of
peaks and troughs
Constant data like disk space used will generally not
average out that much
It all depends on your environment!
When reviewing RRD data you need to take into
consideration these factors, it’s all relative!
26
Graphs - How Templates Are Used - Part 1
http://docs.pnp4nagios.org/pnp-0.4/tpl
27
Graphs - How Templates Are Used - Part 2
PNP4Nagios queries the XML file for the
<TEMPLATE> tag
Each datasource has it’s own <TEMPLATE> tag
<TEMPLATE>check-host-alive</TEMPLATE>
Also can be a trailing string in the performance
data (good for distributed monitoring)
OK - 127.0.0.1: rta 2.687ms, lost 0% |
rta=2.687ms;3000.000;5000.000;0; pl=0%;80;100;;
[check_icmp]
28
Graphs - How Templates Are Used - Part 3
From the example graphs:
<TEMPLATE>check-host-alive</TEMPLATE>
<TEMPLATE>check_local_load_alt</TEMPLATE>
PNP4Nagios looks for a php file with this name
in the following folders:
/usr/local/nagios/share/pnp/templates.dist
/usr/local/nagios/share/pnp/templates
29
Graphs - How Templates Are Used - Part 4
check-host-alive
/usr/local/nagios/share/pnp/templates.dist/check-host-
alive.php
This PHP file generates the performance graph
check_local_load_alt
check_local_load_alt.php does NOT exist
Default template is used:
/usr/local/nagios/share/pnp/templates.dist/default.php
29
30
Graphs - Creating Your Own Template - Part 1
The check_command name is what Nagios uses
to insert into the <TEMPLATE> tag in the XML
file (how PNP determines which template to use)
So for this example I have created a copy of an
existing command
check_xi_service_nsclient_alt
31
Graphs - Creating Your Own Template - Part 2
The service definition using the new command
32
Graphs - Creating Your Own Template - Part 3
The graph currently being generated
Default Template being used
Check Command being used
.rrd and .xml files currently contain valid data
33
Graphs - Creating Your Own Template - Part 4
Copy the file:
/usr/local/nagios/share/pnp/templates.dist/default.php
To the following location with the name:
/
usr/local/nagios/share/pnp/templates/check_xi_servic
e_nsclient_alt.php
Edit check_xi_service_nsclient_alt.php
34
Graphs - Creating Your Own Template - Part 5
In the graph we are removing the bottom two lines
Default Template
Check Command command name
Which are lines 62 and 63
$def[$i] .= 'COMMENT:"Default Templater" ';
$def[$i] .= 'COMMENT:"Check Command ' .
$TEMPLATE[$i] . 'r" ';
Save check_xi_service_nsclient_alt.php
34
35
Graphs - Creating Your Own Template - Part 6
How easy was that!
Updated graph
Template Name and Check Command removed
36
PNP Templates In Detail - Part 1
Lets get into specifics
Template we just
modified
It’s not that
complicated! (LOL)
36
37
PNP Templates In Detail - Part 2
.rrd files can have multiple datasources (DS)
Round Trip Time and Packet Loss for example
38
PNP Templates In Detail - Part 3
Example of .rrd file with five DS
Two graphs generated using these DS
39
PNP Templates In Detail - Part 4
Default Template creates one graph per DS
This is a simple PHP foreach loop
The code within the loop references the relevant
DS by the $i variable
40
PNP Templates In Detail - Part 5
This section of the template uses three DS
One graph will be generated using three DS
$opt[1] and $def[1] is a reference for the first graph
being generated
41
PNP Templates In Detail - Part 6
Number formatting
Our modified template and the relative code
The relevant information:
%3.4lf
42
PNP Templates In Detail - Part 7
The three DS template and the relative code
The relevant information:
%4.0lf
43
PNP Templates In Detail - Part 8
Numbers are displayed with four decimal points
%3.4lf
Numbers are displayed as whole numbers
%4.0lf
44
PNP Templates In Detail - Part 9
PNP documentation defines the number
formatting using the printf standard defined here
http://en.wikipedia.org/wiki/Printf
The number (1) and the letter "L" look alike
%3.4lg contains a lower case "L"
The syntax is
%[parameter][flags][width][.precision][length]type
45
PNP Templates In Detail - Part 10
width
When the number is generated on the graph, it will
allocate a minimum specific width, this helps you
align numbers in a column style
precision
Determines if the number displayed is a whole
number, or a number with a specific number of digits
following the decimal place
46
PNP Templates In Detail - Part 11
%3.4lf
width = 3
precision = .4
hence the displayed number is 25.3800
%4.0lf
width = 4
precision = .0
hence the displayed number is 14
Because the precision is 0, NO decimal place is used
47
MRTG - Part 1
MRTG = Multi Router Traffic Grapher
Nagios Addon that is useful for monitoring
network switch and router bandwidth using SNMP
Can be complicated to understand configuration
48
MRTG - Part 2
Nagios XI Wizard called “Network Switch /
Router” automates the configuration of MRTG
MRTG configuration file
/etc/mrtg/mrtg.cfg
MRTG runs as a cron job every five minutes
cron comes from the Greek word for time, Ï‡ÏÏŒÎœÎżÏ‚
[chronos]
Hence cron is a software utility on linux which is a
time-based job scheduler
In the windows world it's the Task Scheduler
49
MRTG - Part 3
When MRTG runs, it gathers data from the
devices defined in the mrtg.cfg file
It dumps this data into the folder
/var/lib/mrtg
For every port monitored, an .rrd file is created
(no .xml file created at this point)
Another background process will then take the
data in /var/lib/mrtg and put it into the correct
location
/usr/local/nagios/share/perfdata/<host>
50
MRTG Gotchya - Part 1
When the Wizard populates the mrtg.cfg file it will
add ALL ports on the switch to the config file
Even if you only selected to monitor 10 ports on
the switch
The Nagios XI Service Configuration will only have 10
ports defined as service definitions
Every time the MRTG cron job runs, it will collect
data from all ports on the switch (as defined in the
mrtg.cfg file)
Extra CPU cycles, extra disk space
50
51
MRTG Gotchya - Part 2
On a 48 port switch this might not concern you
But in a stack of two 48 port switches this
becomes 96 ports + also other internal ports like
link aggregation ports (another 32 ports perhaps)
So these additional 128 ports have now added
8700+ configuration lines to the mrtg.cfg file
128 ports consume about 24 MB of .rrd disk
space
In my past environment, the mrtg.cfg file was
59,000 lines long!
51
52
MRTG Gotchya - Part 3
Suggestion
Clean up the mrtg.cfg file
Remove the ports you do not wish to gather data on
Can this cause Problems?
Yes!
Problem 1
Monitoring additional ports later using the wizard will
not work
The wizard will NOT re-add the ports to the mrtg.cfg file
Wizard detects switch / router is already in the mrtg.cfg file
53
MRTG Gotchya - Part 4
Problem 2 - Adding a switch (or module) to an
existing switch
Monitoring additional ports later using the wizard will
not work
The wizard will NOT add newly detected ports to the
mrtg.cfg file
Wizard detects switch / router is already in the mrtg.cfg file
Very similar behaviour to Problem 1
Only relevant when the new switch / module is managed
through the existing IP Address / FQDN
Common with stacked switches, adding another switch to
the stack
54
MRTG Gotchya - Part 5
Solutions to Problems 1 & 2
cfgmaker
This is how the Wizard configures mrtg.cfg
The wizard updates the existing mrtg.cfg using a php
function (not available from the CLI)
Run cfgmaker @ CLI to generate a config file
Add the contents of the config file to the existing mrtg.cfg
cfgmaker --noreversedns “public@192.168.1.1" --output=output.txt
55
MRTG Gotchya - Part 6
Problem 3 - With a frequently changing
environment, keep mrtg.cfg clean
Monitoring WAN links for remote routers?
WAN link no longer exists?
Disable / Delete service definition(s) in Core Configuration
Manager (CCM)
You will NEED to remove device from mrtg.cfg
Why?
MRTG will still try and collect data from WAN links no longer
accessible
Causes delays and can make MRTG run past the default 5
minute schedule ... can cause graph anomalies
56
MRTG Gotchya - Part 7
Problem 4 - Firmware Upgrade causes port
numbering to change
Major firmware revision applied to switch / router
New data collected for ports is no longer the same pattern
Internal port numbering has changed
mrtg.cfg queries specific port numbers, does not use port
names or descriptions
Example
Old Firmware: WAN = Port 1 LAN = Port 2
New Firmware: WAN = Port 0 LAN = Port 1
Have seen this behaviour on SonicWALL Firewalls
57
Questions
Questions ?
58
Discount Offer
But wait, there's more ...
When visiting the Nagios XI use my affiliate link
http://www.nagios.com/#ref=3oHG00

Weitere Àhnliche Inhalte

Was ist angesagt?

Présentation de nagios
Présentation de nagiosPrésentation de nagios
Présentation de nagios
ilyassin
 
Hadoopë°œí‘œìžëŁŒ
Hadoopë°œí‘œìžëŁŒHadoopë°œí‘œìžëŁŒ
Hadoopë°œí‘œìžëŁŒ
Vong Sik Kong
 

Was ist angesagt? (20)

Linux Hardening
Linux HardeningLinux Hardening
Linux Hardening
 
Présentation de nagios
Présentation de nagiosPrésentation de nagios
Présentation de nagios
 
Tuto Serveur Vocal Interactif (SVI ou IVR)
Tuto Serveur Vocal Interactif  (SVI ou IVR)Tuto Serveur Vocal Interactif  (SVI ou IVR)
Tuto Serveur Vocal Interactif (SVI ou IVR)
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
 
Zabbix Performance Tuning
Zabbix Performance TuningZabbix Performance Tuning
Zabbix Performance Tuning
 
Hadoopë°œí‘œìžëŁŒ
Hadoopë°œí‘œìžëŁŒHadoopë°œí‘œìžëŁŒ
Hadoopë°œí‘œìžëŁŒ
 
NGINX: Basics and Best Practices
NGINX: Basics and Best PracticesNGINX: Basics and Best Practices
NGINX: Basics and Best Practices
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
 
Deploying IPv6 on OpenStack
Deploying IPv6 on OpenStackDeploying IPv6 on OpenStack
Deploying IPv6 on OpenStack
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Thanos - Prometheus on Scale
Thanos - Prometheus on ScaleThanos - Prometheus on Scale
Thanos - Prometheus on Scale
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Meetup 23 - 02 - OVN - The future of networking in OpenStack
Meetup 23 - 02 - OVN - The future of networking in OpenStackMeetup 23 - 02 - OVN - The future of networking in OpenStack
Meetup 23 - 02 - OVN - The future of networking in OpenStack
 
nexus helm ì„€ìč˜, docker/helm repo ì„€ì •êłŒ 예제
nexus helm ì„€ìč˜, docker/helm repo ì„€ì •êłŒ 예제nexus helm ì„€ìč˜, docker/helm repo ì„€ì •êłŒ 예제
nexus helm ì„€ìč˜, docker/helm repo ì„€ì •êłŒ 예제
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
ONOS SDN Controller - Clustering Tests & Experiments
ONOS SDN Controller - Clustering Tests & Experiments ONOS SDN Controller - Clustering Tests & Experiments
ONOS SDN Controller - Clustering Tests & Experiments
 
Docker Networking with New Ipvlan and Macvlan Drivers
Docker Networking with New Ipvlan and Macvlan DriversDocker Networking with New Ipvlan and Macvlan Drivers
Docker Networking with New Ipvlan and Macvlan Drivers
 
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
 
Vpn
VpnVpn
Vpn
 

Andere mochten auch

Internet Programming With Python Presentation
Internet Programming With Python PresentationInternet Programming With Python Presentation
Internet Programming With Python Presentation
AkramWaseem
 
Networking in OpenStack for non-networking people: Neutron, Open vSwitch and ...
Networking in OpenStack for non-networking people: Neutron, Open vSwitch and ...Networking in OpenStack for non-networking people: Neutron, Open vSwitch and ...
Networking in OpenStack for non-networking people: Neutron, Open vSwitch and ...
Dave Neary
 
Openstack Neutron and SDN
Openstack Neutron and SDNOpenstack Neutron and SDN
Openstack Neutron and SDN
inakipascual
 
2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)
2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)
2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)
Mirantis
 

Andere mochten auch (20)

Nagios Conference 2011 - William Leibzon - Nagios In Cloud Computing Environm...
Nagios Conference 2011 - William Leibzon - Nagios In Cloud Computing Environm...Nagios Conference 2011 - William Leibzon - Nagios In Cloud Computing Environm...
Nagios Conference 2011 - William Leibzon - Nagios In Cloud Computing Environm...
 
Nagios Conference 2014 - Troy Lea - Monitoring VMware Virtualization Using vMA
Nagios Conference 2014 - Troy Lea - Monitoring VMware Virtualization Using vMANagios Conference 2014 - Troy Lea - Monitoring VMware Virtualization Using vMA
Nagios Conference 2014 - Troy Lea - Monitoring VMware Virtualization Using vMA
 
Internet Programming With Python Presentation
Internet Programming With Python PresentationInternet Programming With Python Presentation
Internet Programming With Python Presentation
 
Nagios Conference 2012 - Troy Lea - Custom Wizards, Components and Dashlets i...
Nagios Conference 2012 - Troy Lea - Custom Wizards, Components and Dashlets i...Nagios Conference 2012 - Troy Lea - Custom Wizards, Components and Dashlets i...
Nagios Conference 2012 - Troy Lea - Custom Wizards, Components and Dashlets i...
 
Python quickstart for programmers: Python Kung Fu
Python quickstart for programmers: Python Kung FuPython quickstart for programmers: Python Kung Fu
Python quickstart for programmers: Python Kung Fu
 
Nagios Conference 2014 - Shamas Demoret - Getting Started With Nagios XI
Nagios Conference 2014 - Shamas Demoret - Getting Started With Nagios XINagios Conference 2014 - Shamas Demoret - Getting Started With Nagios XI
Nagios Conference 2014 - Shamas Demoret - Getting Started With Nagios XI
 
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User ExperienceNagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
 
Nagios Conference 2011 - Nicholas Scott - Nagios Performance Tuning
Nagios Conference 2011 - Nicholas Scott - Nagios Performance TuningNagios Conference 2011 - Nicholas Scott - Nagios Performance Tuning
Nagios Conference 2011 - Nicholas Scott - Nagios Performance Tuning
 
Bridging The Gap: Explaining OpenStack To VMware Administrators
Bridging The Gap: Explaining OpenStack To VMware AdministratorsBridging The Gap: Explaining OpenStack To VMware Administrators
Bridging The Gap: Explaining OpenStack To VMware Administrators
 
Trevor McDonald - Nagios XI Under The Hood
Trevor McDonald  - Nagios XI Under The HoodTrevor McDonald  - Nagios XI Under The Hood
Trevor McDonald - Nagios XI Under The Hood
 
Automated Security Hardening with OpenStack-Ansible
Automated Security Hardening with OpenStack-AnsibleAutomated Security Hardening with OpenStack-Ansible
Automated Security Hardening with OpenStack-Ansible
 
Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016
 
OpenStack + VMware: Everything You Need to Know (Kilo-edition)
OpenStack + VMware: Everything You Need to Know (Kilo-edition)OpenStack + VMware: Everything You Need to Know (Kilo-edition)
OpenStack + VMware: Everything You Need to Know (Kilo-edition)
 
Networking in OpenStack for non-networking people: Neutron, Open vSwitch and ...
Networking in OpenStack for non-networking people: Neutron, Open vSwitch and ...Networking in OpenStack for non-networking people: Neutron, Open vSwitch and ...
Networking in OpenStack for non-networking people: Neutron, Open vSwitch and ...
 
Openstack Neutron and SDN
Openstack Neutron and SDNOpenstack Neutron and SDN
Openstack Neutron and SDN
 
OpenStack Neutron Tutorial
OpenStack Neutron TutorialOpenStack Neutron Tutorial
OpenStack Neutron Tutorial
 
Ubuntu – Linux Useful Commands
Ubuntu – Linux Useful CommandsUbuntu – Linux Useful Commands
Ubuntu – Linux Useful Commands
 
(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective
(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective
(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective
 
My Top 10 slides on presentations
My Top 10 slides on presentationsMy Top 10 slides on presentations
My Top 10 slides on presentations
 
2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)
2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)
2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)
 

Ähnlich wie Nagios Conference 2013 - Troy Lea - Leveraging and Understanding Performance Data and Graphs

Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
Sidney Chen
 
Schema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12cSchema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12c
uzzal basak
 
26.1.7 lab snort and firewall rules
26.1.7 lab   snort and firewall rules26.1.7 lab   snort and firewall rules
26.1.7 lab snort and firewall rules
Freddy Buenaño
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
Karam Abuataya
 

Ähnlich wie Nagios Conference 2013 - Troy Lea - Leveraging and Understanding Performance Data and Graphs (20)

Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
 
Automated reduction of attack surface using call graph enumeration
Automated reduction of attack surface using call graph enumerationAutomated reduction of attack surface using call graph enumeration
Automated reduction of attack surface using call graph enumeration
 
My old security advisories on HMI/SCADA and industrial software released betw...
My old security advisories on HMI/SCADA and industrial software released betw...My old security advisories on HMI/SCADA and industrial software released betw...
My old security advisories on HMI/SCADA and industrial software released betw...
 
How To Install and Configure SNMP on RHEL 7 or CentOS 7
How To Install and Configure SNMP on RHEL 7 or CentOS 7How To Install and Configure SNMP on RHEL 7 or CentOS 7
How To Install and Configure SNMP on RHEL 7 or CentOS 7
 
Practical Operation Automation with StackStorm
Practical Operation Automation with StackStormPractical Operation Automation with StackStorm
Practical Operation Automation with StackStorm
 
Android 5.0 Lollipop platform change investigation report
Android 5.0 Lollipop platform change investigation reportAndroid 5.0 Lollipop platform change investigation report
Android 5.0 Lollipop platform change investigation report
 
Mod06 new development tools
Mod06 new development toolsMod06 new development tools
Mod06 new development tools
 
Mod03 linking and accelerating
Mod03 linking and acceleratingMod03 linking and accelerating
Mod03 linking and accelerating
 
Helm Charts Security 101
Helm Charts Security 101Helm Charts Security 101
Helm Charts Security 101
 
Schema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12cSchema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12c
 
PoC Oracle Exadata - Retour d'expérience
PoC Oracle Exadata - Retour d'expériencePoC Oracle Exadata - Retour d'expérience
PoC Oracle Exadata - Retour d'expérience
 
Labs_BT_20221017.pptx
Labs_BT_20221017.pptxLabs_BT_20221017.pptx
Labs_BT_20221017.pptx
 
Nagios Conference 2014 - Rodrigo Faria - Developing your Plugin
Nagios Conference 2014 - Rodrigo Faria - Developing your PluginNagios Conference 2014 - Rodrigo Faria - Developing your Plugin
Nagios Conference 2014 - Rodrigo Faria - Developing your Plugin
 
CI/CD and TDD in deploying kamailio
CI/CD and TDD in deploying kamailioCI/CD and TDD in deploying kamailio
CI/CD and TDD in deploying kamailio
 
2015-06-25 Red Hat Summit 2015 - Security Compliance Made Easy
2015-06-25 Red Hat Summit 2015 - Security Compliance Made Easy2015-06-25 Red Hat Summit 2015 - Security Compliance Made Easy
2015-06-25 Red Hat Summit 2015 - Security Compliance Made Easy
 
WMQ Toolbox: 20 Scripts, One-liners, & Utilities for UNIX & Windows
WMQ Toolbox: 20 Scripts, One-liners, & Utilities for UNIX & Windows WMQ Toolbox: 20 Scripts, One-liners, & Utilities for UNIX & Windows
WMQ Toolbox: 20 Scripts, One-liners, & Utilities for UNIX & Windows
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem
 
26.1.7 lab snort and firewall rules
26.1.7 lab   snort and firewall rules26.1.7 lab   snort and firewall rules
26.1.7 lab snort and firewall rules
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
 

Mehr von Nagios

Mehr von Nagios (20)

Nagios XI Best Practices
Nagios XI Best PracticesNagios XI Best Practices
Nagios XI Best Practices
 
Jesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture OverviewJesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture Overview
 
Sean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient NotificationsSean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient Notifications
 
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionMarcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
 
Janice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios PluginsJanice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios Plugins
 
Dave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceDave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical Experience
 
Mike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service ChecksMike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service Checks
 
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios InstallationMike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
 
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
 
Matt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With NagiosMatt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With Nagios
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
 
Eric Loyd - Fractal Nagios
Eric Loyd - Fractal NagiosEric Loyd - Fractal Nagios
Eric Loyd - Fractal Nagios
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
 
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
 
Nagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson OpeningNagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson Opening
 
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios CoreNrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
 
Nagios Log Server - Features
Nagios Log Server - FeaturesNagios Log Server - Features
Nagios Log Server - Features
 
Nagios Network Analyzer - Features
Nagios Network Analyzer - FeaturesNagios Network Analyzer - Features
Nagios Network Analyzer - Features
 
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing NagiosNagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
 
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment OptionsNagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
 

KĂŒrzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

KĂŒrzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Nagios Conference 2013 - Troy Lea - Leveraging and Understanding Performance Data and Graphs

  • 1. Leveraging and Understanding Performance Data and Graphs Troy Lea troy@box293.com Twitter: @Box293 http://exchange.nagios.org/directory/Owner/Box293/1
  • 2. 2 About Me IT Consultant Nagios Developer Love tinkering with Nagios Why Nagios XI? It’s a virtual appliance - ready to go
  • 3. 3 About This Presentation Understanding how performance data is stored in the back end and how Nagios accesses it Goal is to give you key pieces of information A good reference for understanding concepts This presentation is centered around Nagios XI Valid for other Nagios implementations
  • 5. 5 Basic Concepts - Part 2 ./check_nt -H SERVER -s "" -p 12489 -v USEDDISKSPACE -l C -w 80 -c 95 C: - total: 39.99 Gb - used: 25.28 Gb (63%) - free 14.71 Gb (37%) | 'C: Used Space'=25.28Gb;32.00;38.00;0.00;39.99
  • 6. 6 Basic Concepts - Part 3 Service check command is executed by the monitoring engine Monitoring engine receives the result of the check Data received has performance data Performance data is anything after the | (pipe) The performance data is inserted into an RRD file When viewing the performance graph, PNP4Nagios retrieves the performance data from the RRD file and generates a pretty graph Every time the service check receives performance data, it inserts this performance data into the RRD file which allows you to look at trends over time
  • 7. 7 Plugins The power of Nagios is in the plugins! Monitor what you want, how you want! Resources available that clearly define the guidelines around creating plugins Nagios Plug-in Developer Guidelines http://nagiosplug.sourceforge.net/developer- guidelines.html PNP Documentation http://docs.pnp4nagios.org/pnp-0.4/doc_complete
  • 8. 8 Plugin Output Explained - Part 1 Plugins produce data divided into two parts The pipe symbol “|” is used as a delimiter Example check_icmp OK - 127.0.0.1: rta 2.687ms, lost 0% | rta=2.687ms;3000.000;5000.000;0; pl=0%;80;100;; Data to the left of the pipe symbol is processed by the monitoring engine Data to the right of the pipe symbol is used for inserting into RRD and XML files
  • 9. 9 Plugin Output Explained - Part 2 The exit code Nagios receives from the plugin determines the state of the service 0 = OK 1 = WARNING 2 = CRITICAL 3 = UNKNOWN The exit code is not “visible” when running a check from the command line or looking at the output returned from the plugin
  • 10. 10 Plugin Output Explained - Part 3 No performance data = no pretty graphs You can create a plugin using whatever language and tools are available All that matters is the end result which is returned back to Nagios when the plugin has finished running
  • 11. 11 Plugin Output Explained - Part 4 Examples: Shell script Something you might want to check on the Nagios host itself perl script Remotely checking a device using SNMP OR using third party APIs like the VMware vSphere SDK to remotely access virtual environments Visual Basic script Using NSClient on a Windows host to perform a check (like RDP usage)
  • 12. 12 Performance Data Specifics - Part 1 Asterix (*) fields are required fields, everything else is optional In this instance, rta is the FIRST DS, or DS 1
  • 13. 13 Performance Data Specifics - Part 2 Multiple DS Each DS is separated by a space rta=2.687ms;3000.000;5000.000;0; pl=0%;80;100;; The label can have spaces however the label MUST be enclosed by single quotes 'Round Trip Average'=2.687ms;3000.000;5000.000;0; 'Packet Loss'=0%;80;100;; 13
  • 14. 14 Basic Plugin - Part 1 Example shell script demonstrating how a plugin outputs performance data NUMBER1=$[ ( $RANDOM % 100 ) + 1 ] NUMBER2=$[ ( $RANDOM % 1000 ) + 1 ] echo ""OK - Number 1: $NUMBER1 Number 2: $NUMBER2" | 'Number 1'=$NUMBER1;;;; 'Number 2'=$NUMBER2;;;;“ exit "0"
  • 15. 15 Basic Plugin - Part 2 Here is the output each time it is run: OK - Number 1: 4 Number 2: 74 | 'Number 1'=4;;;; 'Number 2'=74;;;; OK - Number 1: 52 Number 2: 758 | 'Number 1'=52;;;; 'Number 2'=758;;;; OK - Number 1: 73 Number 2: 60 | 'Number 1'=73;;;; 'Number 2'=60;;;; OK - Number 1: 29 Number 2: 338 | 'Number 1'=29;;;; 'Number 2'=338;;;; OK - Number 1: 87 Number 2: 612 | 'Number 1'=87;;;; 'Number 2'=612;;;;
  • 16. 16 Basic Plugin - Part 3 Performance data displayed as a pretty graph Demonstration of how you can generate performance data in a plugin
  • 17. 17 Basic Plugin - Part 4 Now lets add warning and critical thresholds to the performance data string Number1 WARNING @ 50 CRITICAL @ 75 Number2 WARNING @ 500 CRITICAL @ 750 echo ""OK - Number 1: $NUMBER1 Number 2: $NUMBER2" | 'Number 1'=$NUMBER1;50;75;; 'Number 2'=$NUMBER2;500;750;;"
  • 18. 18 Basic Plugin - Part 5 Here is the output each time it is run: OK - Number 1: 4 Number 2: 74 | 'Number 1'=4;50;75;; 'Number 2'=74;500;750;; OK - Number 1: 52 Number 2: 758 | 'Number 1'=52;50;75;; 'Number 2'=758;500;750;; OK - Number 1: 73 Number 2: 60 | 'Number 1'=73;50;75;; 'Number 2'=60;500;750;; OK - Number 1: 29 Number 2: 338 | 'Number 1'=29;50;75;; 'Number 2'=338;500;750;; OK - Number 1: 87 Number 2: 612 | 'Number 1'=87;50;75;; 'Number 2'=612;500;750;;
  • 19. 19 Basic Plugin - Part 6 This demonstrates how the performance data does not have any effect on the state of the service Warning and Critical thresholds are inside the .xml file 19
  • 20. 20 .rrd and .xml files Used for recording the results from Nagios checks Useful for observing daily trends of your environment Invaluable for helping resolve performance issues RRD = Round Robin Database XML = Information about the Nagios check PNP4Nagios uses the RRD and XML files to generate pretty graphs
  • 21. 21 Location of .rrd and .xml files When a service check returns performance data, Nagios dumps this into: /usr/local/nagios/var/spool/perfdata A background process detects the spooled data and creates / updates the relevant .rrd and .xml The Performance Data files live in: /usr/local/nagios/share/perfdata/<host>
  • 22. 22 Extract .rrd data You can extract data from an .rrd file Example (from the CLI): rrdtool fetch /usr/local/nagios/share/perfdata/localhost/_HOST_.rrd MAX -r 900 -s -1h
  • 23. 23 .rrd and .xml Gotchya - Part 1 The .xml file can contain sensitive data <NAGIOS_SERVICECHECKCOMMAND>check_emc_clariion!$HOSTADDRESS$!-u readonly!-p Str0ngPassw0rd!-t sp_cbt_busy!--sp A!--warn 70!--crit 90! </NAGIOS_SERVICECHECKCOMMAND>
  • 24. 24 .rrd and .xml Gotchya - Part 2 Perhaps use a central credential file <NAGIOS_SERVICECHECKCOMMAND>check_vmware_host! check_vmware_config_vcenter01!cpu!90!95!!!! </NAGIOS_SERVICECHECKCOMMAND>
  • 25. 25 .rrd and .xml Gotchya - Part 3 RRD Data is averaged out over time Looking at performance graphs for past day / week / month / year will show results with less spikey data This generally only occurs with data that has lots of peaks and troughs Constant data like disk space used will generally not average out that much It all depends on your environment! When reviewing RRD data you need to take into consideration these factors, it’s all relative!
  • 26. 26 Graphs - How Templates Are Used - Part 1 http://docs.pnp4nagios.org/pnp-0.4/tpl
  • 27. 27 Graphs - How Templates Are Used - Part 2 PNP4Nagios queries the XML file for the <TEMPLATE> tag Each datasource has it’s own <TEMPLATE> tag <TEMPLATE>check-host-alive</TEMPLATE> Also can be a trailing string in the performance data (good for distributed monitoring) OK - 127.0.0.1: rta 2.687ms, lost 0% | rta=2.687ms;3000.000;5000.000;0; pl=0%;80;100;; [check_icmp]
  • 28. 28 Graphs - How Templates Are Used - Part 3 From the example graphs: <TEMPLATE>check-host-alive</TEMPLATE> <TEMPLATE>check_local_load_alt</TEMPLATE> PNP4Nagios looks for a php file with this name in the following folders: /usr/local/nagios/share/pnp/templates.dist /usr/local/nagios/share/pnp/templates
  • 29. 29 Graphs - How Templates Are Used - Part 4 check-host-alive /usr/local/nagios/share/pnp/templates.dist/check-host- alive.php This PHP file generates the performance graph check_local_load_alt check_local_load_alt.php does NOT exist Default template is used: /usr/local/nagios/share/pnp/templates.dist/default.php 29
  • 30. 30 Graphs - Creating Your Own Template - Part 1 The check_command name is what Nagios uses to insert into the <TEMPLATE> tag in the XML file (how PNP determines which template to use) So for this example I have created a copy of an existing command check_xi_service_nsclient_alt
  • 31. 31 Graphs - Creating Your Own Template - Part 2 The service definition using the new command
  • 32. 32 Graphs - Creating Your Own Template - Part 3 The graph currently being generated Default Template being used Check Command being used .rrd and .xml files currently contain valid data
  • 33. 33 Graphs - Creating Your Own Template - Part 4 Copy the file: /usr/local/nagios/share/pnp/templates.dist/default.php To the following location with the name: / usr/local/nagios/share/pnp/templates/check_xi_servic e_nsclient_alt.php Edit check_xi_service_nsclient_alt.php
  • 34. 34 Graphs - Creating Your Own Template - Part 5 In the graph we are removing the bottom two lines Default Template Check Command command name Which are lines 62 and 63 $def[$i] .= 'COMMENT:"Default Templater" '; $def[$i] .= 'COMMENT:"Check Command ' . $TEMPLATE[$i] . 'r" '; Save check_xi_service_nsclient_alt.php 34
  • 35. 35 Graphs - Creating Your Own Template - Part 6 How easy was that! Updated graph Template Name and Check Command removed
  • 36. 36 PNP Templates In Detail - Part 1 Lets get into specifics Template we just modified It’s not that complicated! (LOL) 36
  • 37. 37 PNP Templates In Detail - Part 2 .rrd files can have multiple datasources (DS) Round Trip Time and Packet Loss for example
  • 38. 38 PNP Templates In Detail - Part 3 Example of .rrd file with five DS Two graphs generated using these DS
  • 39. 39 PNP Templates In Detail - Part 4 Default Template creates one graph per DS This is a simple PHP foreach loop The code within the loop references the relevant DS by the $i variable
  • 40. 40 PNP Templates In Detail - Part 5 This section of the template uses three DS One graph will be generated using three DS $opt[1] and $def[1] is a reference for the first graph being generated
  • 41. 41 PNP Templates In Detail - Part 6 Number formatting Our modified template and the relative code The relevant information: %3.4lf
  • 42. 42 PNP Templates In Detail - Part 7 The three DS template and the relative code The relevant information: %4.0lf
  • 43. 43 PNP Templates In Detail - Part 8 Numbers are displayed with four decimal points %3.4lf Numbers are displayed as whole numbers %4.0lf
  • 44. 44 PNP Templates In Detail - Part 9 PNP documentation defines the number formatting using the printf standard defined here http://en.wikipedia.org/wiki/Printf The number (1) and the letter "L" look alike %3.4lg contains a lower case "L" The syntax is %[parameter][flags][width][.precision][length]type
  • 45. 45 PNP Templates In Detail - Part 10 width When the number is generated on the graph, it will allocate a minimum specific width, this helps you align numbers in a column style precision Determines if the number displayed is a whole number, or a number with a specific number of digits following the decimal place
  • 46. 46 PNP Templates In Detail - Part 11 %3.4lf width = 3 precision = .4 hence the displayed number is 25.3800 %4.0lf width = 4 precision = .0 hence the displayed number is 14 Because the precision is 0, NO decimal place is used
  • 47. 47 MRTG - Part 1 MRTG = Multi Router Traffic Grapher Nagios Addon that is useful for monitoring network switch and router bandwidth using SNMP Can be complicated to understand configuration
  • 48. 48 MRTG - Part 2 Nagios XI Wizard called “Network Switch / Router” automates the configuration of MRTG MRTG configuration file /etc/mrtg/mrtg.cfg MRTG runs as a cron job every five minutes cron comes from the Greek word for time, Ï‡ÏÏŒÎœÎżÏ‚ [chronos] Hence cron is a software utility on linux which is a time-based job scheduler In the windows world it's the Task Scheduler
  • 49. 49 MRTG - Part 3 When MRTG runs, it gathers data from the devices defined in the mrtg.cfg file It dumps this data into the folder /var/lib/mrtg For every port monitored, an .rrd file is created (no .xml file created at this point) Another background process will then take the data in /var/lib/mrtg and put it into the correct location /usr/local/nagios/share/perfdata/<host>
  • 50. 50 MRTG Gotchya - Part 1 When the Wizard populates the mrtg.cfg file it will add ALL ports on the switch to the config file Even if you only selected to monitor 10 ports on the switch The Nagios XI Service Configuration will only have 10 ports defined as service definitions Every time the MRTG cron job runs, it will collect data from all ports on the switch (as defined in the mrtg.cfg file) Extra CPU cycles, extra disk space 50
  • 51. 51 MRTG Gotchya - Part 2 On a 48 port switch this might not concern you But in a stack of two 48 port switches this becomes 96 ports + also other internal ports like link aggregation ports (another 32 ports perhaps) So these additional 128 ports have now added 8700+ configuration lines to the mrtg.cfg file 128 ports consume about 24 MB of .rrd disk space In my past environment, the mrtg.cfg file was 59,000 lines long! 51
  • 52. 52 MRTG Gotchya - Part 3 Suggestion Clean up the mrtg.cfg file Remove the ports you do not wish to gather data on Can this cause Problems? Yes! Problem 1 Monitoring additional ports later using the wizard will not work The wizard will NOT re-add the ports to the mrtg.cfg file Wizard detects switch / router is already in the mrtg.cfg file
  • 53. 53 MRTG Gotchya - Part 4 Problem 2 - Adding a switch (or module) to an existing switch Monitoring additional ports later using the wizard will not work The wizard will NOT add newly detected ports to the mrtg.cfg file Wizard detects switch / router is already in the mrtg.cfg file Very similar behaviour to Problem 1 Only relevant when the new switch / module is managed through the existing IP Address / FQDN Common with stacked switches, adding another switch to the stack
  • 54. 54 MRTG Gotchya - Part 5 Solutions to Problems 1 & 2 cfgmaker This is how the Wizard configures mrtg.cfg The wizard updates the existing mrtg.cfg using a php function (not available from the CLI) Run cfgmaker @ CLI to generate a config file Add the contents of the config file to the existing mrtg.cfg cfgmaker --noreversedns “public@192.168.1.1" --output=output.txt
  • 55. 55 MRTG Gotchya - Part 6 Problem 3 - With a frequently changing environment, keep mrtg.cfg clean Monitoring WAN links for remote routers? WAN link no longer exists? Disable / Delete service definition(s) in Core Configuration Manager (CCM) You will NEED to remove device from mrtg.cfg Why? MRTG will still try and collect data from WAN links no longer accessible Causes delays and can make MRTG run past the default 5 minute schedule ... can cause graph anomalies
  • 56. 56 MRTG Gotchya - Part 7 Problem 4 - Firmware Upgrade causes port numbering to change Major firmware revision applied to switch / router New data collected for ports is no longer the same pattern Internal port numbering has changed mrtg.cfg queries specific port numbers, does not use port names or descriptions Example Old Firmware: WAN = Port 1 LAN = Port 2 New Firmware: WAN = Port 0 LAN = Port 1 Have seen this behaviour on SonicWALL Firewalls
  • 58. 58 Discount Offer But wait, there's more ... When visiting the Nagios XI use my affiliate link http://www.nagios.com/#ref=3oHG00

Hinweis der Redaktion

  1. Good afternoon all and thank you for coming to my session. My name is Troy Lea and I&apos;m here to talk to you about leveraging and understanding performance data and graphs in Nagios.
  2. First a little about me. I’m primarily a Windows tech starting back in DOS 6 and Windows 3.1. I’ve worked on a variety of support roles over the years and my last role involved the development and maintenance of a cloud computing platform based on Windows Remote Desktop. I primarily looked after the backend infrastructure. I&apos;ve been using Nagios XI since 2009. I originally tried Nagios before XI was released however being a Windows guy there were some linux barriers that I just could not get my head around. I love Nagios XI because it is delivered as a virtual appliance. Within minutes of importing that VM and powering it on you have a fully . functional . monitoring . product. Before I caught the Nagios bug, my programming experience was all windows related. Batch files, VB scripts and Powershell. I had dabbled in a little HTML but only because I had to. Since then I&apos;ve learnt HTML, PHP, CSS, Javascript, Perl, Bash ... whatever is required to get the result I needed.
  3. In the world of monitoring there is more to Nagios sending alerts because a server is about to run out of hard disk space. Collecting and storing performance data is one of the most useful features in Nagios, with this information you can get an understanding of your environment&apos;s day to day trends. Analysing this data can be very helpful, perhaps to look at growth, or identifying performance bottlenecks. This session is about understanding how the performance data is stored in the back end and how Nagios accesses it. Topics covered in this session are: ‱ Basic concepts ‱ Understanding the .rrd and .xml files ‱ Understanding how pnp generates graphs ‱ Creating custom graph templates in pnp ‱ Writing plugins that will output the performance data you want ‱ Understanding how MRTG works Everything I will talk about is documented on the Internet, however finding that information does not always appear on the first page of your google search results. It&apos;s especially difficult when you are learning a new language or concept, the information out there is not always helpful, or it can get overwhelming. Even though this is an advanced technical session, it&apos;s aimed at delivering the core concepts and information to help you get the results you need (and impress the boss). As I&apos;ve mentioned before, I&apos;m primarily a Windows tech. So some of the material I talk about might be obvious to a linux tech however to a windows tech it can get frustrating, so my goal here is to make the content accessible to anyone. This presentation is centered around Nagios XI. There are references to locations of files and components, your implementation of Nagios may differ slightly however the concepts are still the same.
  4. I&apos;ll start off quickly explaining the basic concepts.   Let&apos;s look at a common service that is used in monitoring, a free disk space check.   Here is the service configuration and the current service status.
  5. Here is this command and the output we see when we execute it from the CLI. The data after the pipe symbol is the performance data, I will explain this in more detail later on. Here is the Advanced Status Detail of the service showing the performance data string.
  6. Here is the performance graph for this service, the end result. The chain of events that occur are ...
  7. When I first began using Nagios, it became apparent that the power behind Nagios came with plugins. The ability to monitor what you want, how you want, using a variety of different methods really appealed to me.   I think everyone who starts developing plugins for Nagios has a very similar journey We modify an existing plugin to make it suit our environment We then create a simple plugin using an existing one to do something completely different Before we know it we are writing very complex plugins   There are two exceptional resources available that clearly define the guidelines around creating plugins.   Nagios Plug-in Developer Guidelines http://nagiosplug.sourceforge.net/developer-guidelines.html The information here is very clear and easy to understand, I constantly am referring to this   PNP Documentation http://docs.pnp4nagios.org/pnp-0.4/doc_complete This has some more detailed information and examples in relation to the performance data and how it needs to be formatted  
  8. Taken directly from the PNP documentation   When the plugin produces performance data, it is divided into two parts. The pipe symbol (&quot;|&quot;) is used as a delimiter.   Example check_icmp : OK - 127.0.0.1: rta 2.687ms, lost 0% | rta=2.687ms;3000.000;5000.000;0; pl=0%;80;100;; Something I want to make really clear here is: The data to the left of the pipe symbol is processed by the monitoring engine The data to the right of the pipe symbol is used for inserting into RRD files for performance data
  9. The only information not shown here is the exit code Nagios receives from the plugin that determines the state of the check 0 = OK 1 = WARNING 2 = CRITICAL 3 = UNKNOWN
  10. If your plugin does not output performance data, then graphs will not be available for that service.   So it&apos;s as basic as that. You can create your plugin using whatever language you need to, as it fits your purpose and needs. All that matters is the end result which is returned back to Nagios when the plugin has finished running.
  11. Shell script Something you might want to check on the Nagios host itself perl script Remotely checking a device using SNMP OR using third party APIs like the VMware vSphere SDK to remotely access virtual environments visual basic script Using NSClient on a Windows host to perform a check (like RDP usage)
  12. Here is a breakdown of the performance data The asterix (*) fields are required fields, everything else is optional.   In this instance, rta is the FIRST datasource, or datasource 1    
  13. A plugin can output multiple datasources. Each datasource is separated by a space and the format is the same.   Example: rta=2.687ms;3000.000;5000.000;0; pl=0%;80;100;; The label can have spaces if you desire however the label MUST be enclosed by single quotes   Example: &apos;Round Trip Average&apos;=2.687ms;3000.000;5000.000;0; &apos;Packet Loss&apos;=0%;80;100;;
  14. Here is a basic plugin I have created to demonstrate outputting performance data using a shell script. This is just a simple script that generates two random numbers and outputs them. For demonstration purposes this script will always return an OK state.   NUMBER1=$[ ( $RANDOM % 100 ) + 1 ] NUMBER2=$[ ( $RANDOM % 1000 ) + 1 ]   echo &quot;&quot;OK - Number 1: $NUMBER1 Number 2: $NUMBER2&quot; | &apos;Number 1&apos;=$NUMBER1;;;; &apos;Number 2&apos;=$NUMBER2;;;;&quot; exit &quot;0&quot;
  15. Here is the output each time it is run:
  16. Here is the graphs displayed after the check has been running for a while 
  17. Now I am going to define a warning and critical threshold in the performance data string , this will show you how they appear in the graphs.   Number1 WARNING @ 50 CRITICAL @ 75 Number2 WARNING @ 500 CRITICAL @ 750   echo &quot;&quot;OK - Number 1: $NUMBER1 Number 2: $NUMBER2&quot; | &apos;Number 1&apos;=$NUMBER1;50;75;; &apos;Number 2&apos;=$NUMBER2;500;750;;&quot;  
  18. Here is the output each time it is run:
  19. This demonstrates how the performance data does not have any effect on the state of the services.   Also, if you were to look into the XML file generated for this service, this is where the warning and critical thresholds are stored.  
  20. What are Performance Data Files?   Performance data files are used for recording the results from Nagios checks, which in turn become useful for observing the daily trends of your environment. Being able to look at hourly/daily/weekly/monthly/yearly historical data can be invaluable when trying to resolve performance issues. It helps get to the bottom of those customer complaints like &quot;the server is slow&quot;.   There are two files created by Nagios for every check that generates performance data.   The RRD file is a Round Robin Database. That means that after some time the oldest data will be dropped at the &quot;end&quot; and it will be replaced by new values &quot;at the beginning&quot;. This is the file that contains all the historical data.   The XML file contains detailed information about the check that generated the performance data. Things like warning and critical thresholds, names of the checks. This file is updated at the same time as the RRD file, so it will always be information that is obtained from when the check was last run.   How are these files used?   When you are viewing performance graphs in Nagios, they are generated by an application called PNP4Nagios. PNP4Nagios uses the XML and RRD files to generate these graphs. PNP4Nagios allows you to create your own customised graphs based on the information in the XML file and then displays the historical data in the RRD file.   It takes a couple of service checks to run initially to collect performance data before you will see performance graphs. Depending on the frequency of your service checks depends on how long it takes to see the data in the performance graphs.
  21. Initially, when a service check returns performance data, nagios dumps this into: /usr/local/nagios/var/spool/perfdata   Another background process will then detect this spooled perfdata and create/update the relevant .rrd and .xml files.   The Performance Data files live in: /usr/local/nagios/share/perfdata/&lt;host&gt;   There is a folder for each host   The host object files are called _HOST_ (the check_icmp command that determines if a host is up or down)   All the other files are relevant to the service objects defined for each host.
  22. If you want to extract the data from an .rrd file you can do it with the following command: rrdtool fetch /usr/local/nagios/share/perfdata/localhost/_HOST_.rrd MAX -r 900 -s -1h If you don’t specify start and end times the data retrieved will be from the past 1 day.
  23. .xml file can contain sensitive data When the .xml file is created/updated, a lot of information is stored in this file that is relevant to the check command that was run, which could have a password stored in plain text.   For example here is a service check that has a password stored in the definition And here is the line in the .xml file   &lt;NAGIOS_SERVICECHECKCOMMAND&gt;check_emc_clariion!$HOSTADDRESS$!-u readonly!-p Str0ngPassw0rd !-t sp_cbt_busy!--sp A!--warn 70!--crit 90!&lt;/NAGIOS_SERVICECHECKCOMMAND&gt;
  24. There are many methods to work around this behaviour if you are not comfortable with it. For example this service check uses a file that contains the credentials And you can see that the credentials are not inside the .xml file   &lt;NAGIOS_SERVICECHECKCOMMAND&gt;check_vmware_host! check_vmware_config_vcenter01 !cpu!90!95!!!!&lt;/NAGIOS_SERVICECHECKCOMMAND&gt;
  25. RRD Data is averaged out over time. When you look at performance graphs for past day / week / month / year will show results with less spikey data. This generally only occurs with data that has lots of peaks and troughs, the lower troughs will cause the overall average to be less to the peaks will appear lower. Something like active user sessions will have a peak through business hours and then a drop to almost nothing out of hours. Constant data like disk space used will generally not average out that much. It all depends on your environment! When reviewing RRD data you need to take into consideration these factors as it’s all relative.
  26. When you are viewing performance graphs in Nagios, they are generated by an application called PNP4Nagios.   Here are two examples: The difference between the two graphs is that the first one has a PNP template and hence it&apos;s a little prettier, compared to the second graph that is generic and tells you that it is using the Default Template.  
  27. So how does this work? http://docs.pnp4nagios.org/pnp-0.4/tpl When the RRD and XML files are created / updated, the check_command directive* defined in the service object is added to the XML file under each &lt;DATASOURCE&gt; tag as the TEMPLATE tag.   In relation to distributed monitoring, if PNP finds a string enclosed in brackets at the end of performance data it will be recognized as check command and will be used as PNP template. OK - 127.0.0.1: rta 2.687ms, lost 0% | rta=2.687ms;3000.000;5000.000;0; pl=0%;80;100;; [check_icmp]   When PNP goes to display the graph, it queries the XML file and gets the TEMPLATE tag for each datasource.
  28. For the example graphs shown on previous slides, these values are: &lt;TEMPLATE&gt;check-host-alive&lt;/TEMPLATE&gt; and &lt;TEMPLATE&gt;check_local_load_alt&lt;/TEMPLATE&gt;   In the examples above, these values are: check-host-alive check_local_load_alt   It then looks in the following folders to see if it can find a php file that has one of these names: /usr/local/nagios/share/pnp/templates.dist /usr/local/nagios/share/pnp/templates
  29. In the first example above it finds the following file: /usr/local/nagios/share/pnp/templates.dist/check-host-alive.php So it uses this PHP file to generate the performance graph   In the second example above it cannot find any file named check_local_load_alt.php so it uses the default template which is: /usr/local/nagios/share/pnp/templates.dist/default.php
  30. Creating your own templates isn&apos;t too hard, but it is a little complex and will require some trial and error.   The best starting point is to find an existing template and modify it to your liking.   As described in the previous slide, the name of the check_command is what Nagios uses to insert into the &lt;TEMPLATE&gt; tag in the XML file (how PNP determines which template to use).   So for this example I have created a copy of an existing command called &quot;check_xi_service_nsclient_alt&quot;. You can see the command is identical to the original command except for the name.
  31. Here is the service I am using that I want to view custom graphs for, you can now see it is using the new command.
  32. And here is the graph being generated by this service, you can see it is currently using the default template and it is also telling you the check command So that&apos;s our starting point, we know the data currently exists in the RRD and XML files and we are ready to create our custom template
  33. Copy the file: /usr/local/nagios/share/pnp/templates.dist/ default.php To the following location with the name: /usr/local/nagios/share/pnp/templates/ check_xi_service_nsclient_alt.php   Edit the file check_xi_service_nsclient_alt.php
  34. I am going to remove the bottom two lines Default Template Check Command command name   Which are lines 62 and 63 $def[$i] .= &apos;COMMENT:&quot;Default Template\r&quot; &apos;; $def[$i] .= &apos;COMMENT:&quot;Check Command &apos; . $TEMPLATE[$i] . &apos;\r&quot; &apos;;   Save the file, and then go and reload the performance graph and we will see the new template
  35. Reload the performance graph and we will see the new template The blue arrow I&apos;ve added to the graph is showing where the template name and command name used to be   How easy was that!
  36. Now I&apos;ll get a little more technical   Here is the modified template we just created.   There are a few sections in here that can get overwhelming but once you understand it, it&apos;s not that complicated
  37. An RRD file can have multiple data sources. An example of this is the check-host-alive command that is a ping test used for host defintions. The performance data returned from this service contains two datasources:   Round Trip Time Packet Loss   When you view the graphs for this service you actually see two graphs. Each datasource increases the size of the .rrd file
  38. Here is a check command that generates five data sources and the pnp template uses these to generate two performance graphs. The first graph uses three datasources and the second graphs uses two data sources
  39. So going back to the template we modified. The default template is designed to create one graph per data source. It does this by looking at the RRD and looping through each datasource and generates the graphs.   This is a simple php foreach loop And the code within the loop references the relevant datasource by the $i variable So that&apos;s how individual graphs can be generated for each datasource in a generic fashion.
  40. In a previous slide I showed you a check command that generated five datasources and the first graph contained three of these datasources. Because I created the check command I know that it will always output five data sources in the performance data and they will always be outputted in the same numerical order. I will explain this in further detail later on when we get to the section on creating your own plugins.   Here is the first part of the template that shows you how this is achieved: On line 10 we define var1 as the 1st datasource $DS[ 1 ] On line 11 we define var2 as the 2nd datasource $DS[ 2 ] On line 12 we define var3 as the 3rd datasource $DS[ 3 ]   And then throughout the rest of the code the graphs that are generated are pulling the specific data from the RRD files for each specific datasource   $opt[1] and $def[1] is a reference for the first graph being generated. Not shown here is the code that generates the second graph which are referenced as $opt[2] and $def[2]
  41. The last part I will talk about in relation to templates is the number formatting. Things here can get very complex indeed.   Here is an example of the numbers displayed on the custom template we modified and the relative code The relevant information I am going to refer to is %3.4lf
  42. Here is an example of the numbers displayed for the five datasource .rrd file and the relative code The relevant information I am going to refer to is %4.0lf  
  43. What I am highlighting here is: On the first graph, the numbers are displayed with four decimal points On the second graph, the numbers are displayed as whole numbers
  44. The PNP documentation defines the number formatting using the printf standard defined here: http://en.wikipedia.org/wiki/Printf   I must point out that as the number (1) and the letter &quot;L&quot; look alike, the format %3.4lg contains a lower case &quot;L&quot;. The syntax is %[parameter][flags][width][.precision][length]type  
  45. Specifically I am going to focus on: width When the number is generated on the graph, it will allocate a minimum specific width, this helps you align numbers in a column style precision Determines if the number displayed is a whole number, or a number with a specific number of digits following the decimal place
  46. % 3.4 lf width = 3 precision = .4 hence the displayed number is 25.3800   % 4.0 lf width = 4 precision = .0 hence the displayed number is 14 Because the precision is 0, no decimal place is used   To be honest I haven&apos;t spent time looking into the other options available in the formatting style, as width and precision were the only options I needed to get the results I was after.
  47. MRTG stands for the Multi Router Traffic Grapher
  48. In Nagios XI, MRTG uses a config file (/etc/mrtg/mrtg.cfg) that contains all the devices and their ports that it is going to gather data on.   When you run the Network Switch / Router wizard, it will populate the MRTG config file with the device you just queried.   MRTG is run as a cron job every 5 minutes and is defined in /etc/cron.d/mrtg   The name cron comes from the Greek word for time, Ï‡ÏÏŒÎœÎżÏ‚ [chronos]. Hence cron is a software utility on linux which is a time-based job scheduler. In the windows world it&apos;s the Task Scheduler.
  49. When MRTG runs, it gathers the data from the devices defined in the mrtg.cfg file and dumps this data into the folder /var/lib/mrtg For every port monitored an .rrd file is created. NOTE: there is no .xml file generated   In Nagios XI, the service checks defined for the ports you want to monitor will run a command that looks for the .rrd file in the &quot;/var/lib/mrtg&quot; folder and then puts this information into the regular location for performance data &quot;/usr/local/nagios/share/perfdata/&lt;host&gt;/&lt;service&gt;&quot;
  50. As I explained before, when you run the Network Switch / Router wizard, it will populate the MRTG config file with the details about device you just queried. In the wizard you may have only selected to monitor 10 ports on the switch. Regardless of the selections you make in the wizard, mrtg.cfg will be populated with all ports on the switch. Nagios itself will only have the service definitions for the 10 ports you selected to monitor.
  51. What you can do here is to go and edit the mrtg.cfg file and remove all of the ports that you do not wish to gather data on. However this can cause another issue in the future which I will explain here.   Let&apos;s say that you need to now monitor an additional two ports on that switch. Running the Network Switch / Router wizard again runs you through all the steps and select these ports. However due to how the wizard works, when it detects that this switch already exists in the mrtg.cfg file, it will not update the mrtg.cfg file. Even though you have edited the mrtg.cfg file in the past and removed these ports, the wizard does not look for this level of detail.
  52. Another similar behaviour occurs in relation to switch stacking. For example I have a stack of two 48 port switches (96 ports in total). So in the past I ran the wizard, monitored everything I needed. Now we have added an additional 48 port to the switch stack, taking the total ports to 144. Because this is a stack of switches, it is all monitored through one IP address. So the same behaviour explained above occurs. Running the Network Switch / Router wizard again runs you through all the steps and select these additional ports. However due to how the wizard works, when it detects that this switch already exists in the mrtg.cfg file, it will not update the mrtg.cfg file.
  53. Use the cfgmaker tool to update the mrtg.cfg file
  54. When you are monitoring an environment that changes frequently, it helps to keep the mrtg.cfg file clean. For example, in my environment we have clients that have multiple WAN links connected in a private IP cloud. We monitor the client routers on these WAN links. From time to time WAN links are decomissioned. While we remove these client routers from the Nagios XI configuration, MRTG is still trying to collect data from these client routers. If the WAN IP no longer exists, then it is going to timeout while trying to contact these routers. These timeouts are going to have an effect, especially as your mrtg.cfg file contains more and more decomissioned client routers. Keeping in mind that MRTG runs every five minutes, these timeoutes can cause MRTG to run longer and hence it&apos;s not really running every five minutes anymore.
  55. Firmware upgrades on client routers can cause issues as well. Specifically we&apos;ve noticed this behaviour on SonicWALL firewalls. What can happen is when a major firmware revision is released, the numbering of ports inside the firmware changes. For example the WAN port we monitored was port 1 and the LAN port was port 2. After the firmware upgrade the WAN port became port 0 and the LAN port became port 1. We are only monitoring the WAN port using MRTG however MRTG is still trying to gather data from the SonicWALL on for port 1, so now your MRTG graphs are going to reflect all the data that is relative to the LAN port on the router and not the WAN port. What we saw was a massive jump in the graphs because we were collecting all the local LAN traffic passing through that port, when we were only interested in the WAN port activity.   Â