OpenNMS is an open source network management platform that provides fault and performance management for online services like websites and databases running on networks with various systems. It monitors devices using SNMP and ICMP polling, generates alarms for faults, and collects performance data like response times, bandwidth usage, and CPU loads. The web interface provides overviews and details on availability, events, alarms, discovery of hosts, and categories for organizing services and custom views. Scheduled outages can be configured to avoid false alarms during planned maintenance.
2. Introduction
• OpenNMS is the first Enterprise grade
network management platform developed
under the open source model
• It provides online services such as websites,
various web application, and databases.
• It run a variety of systems on their networks
such as business servers with both Windows
and Linux, proprietary VoIP systems, backup
systems, and Virtualized platforms.
4. Feature(2)
• Performence Mgmt
Availiability monitoring
Avg Response time data
Bandwidth utilization data
Buffer usage data
Conditional Alerts
Consistent performence level
CPU load data
Device Disk Space Data
Device Fan Monitoring
Device Temprature Monitoring
Hardware Monitoring
Historical logs
Interface Error Data
Memory Utilization
Network Discard Data
Network Discovery
Network Latency Data
Network Topology
Packet loss data
Performence data collection
Performence report generation
Power supply monitoring
Syslog Messages
Utilization and Error Rates
5. The Web User Interface
•The Web UI is very straight forward to use
•The home page provides a very-high level overview of
information available
•As shown below
6. Host Discovery
•The very first thing to do is to configure some host to monitor
•By default OpenNMS monitore no network nodes at all
•Select Admin|Configure Discovery |Add New to add nodes individually to
be discoverd
•Once completed we will get something like shown in the followoing
spcifics table screenshot
7. Availability
•The likely most useful information at this point of time is
availabilty of service discovered on the node.
•Services are discovered on the nodes network interfaces and
listed in the table
•When services go down,OpenNMS triggers and (outage) event
and the availibilty of the service is affected.
•The availibility table before and after a service went down is
shown side-by-side in the screenshot that follows
8. Events and Alarms
•Events can be thought of as immutable records while
•Alarms add a mutable lifecycle to the event
management subsystem of OpenNMS.
9. Organizing Users, Groups, and
Catagories
• Users: The technical people like the employees
of an organization.
• Groups: Can be used to group users by similar
work responsibilities or even based on duty
schedules. For example, Systems Administrators,
Night Technicians, or Routers Maintenance Team
can all be perfectly valid groups in OpenNMS.
• Roles: Although duty schedules can be set both
on users and groups, Roles are specifically
designed to define On Call Schedules for the staff.
11. Catagories Management
•Service Level Management Categories (SLM categories) are used to group
interfaces and services.
•You can see the default SLM categories on the home page (once logged in) in
the Availability Over the Past 24 Hours table (aka. the Availability table) in the
middle of the page.
•Those categories can be configured to your liking and are useful to include in
historical availability reports for management.
12. Catagories Management Cont......
•Surveillance Categories are used to define peruser custom Surveillance
Views.
•Surveillance views define the scope and layout of a user dashboard with the
purpose of providing a quick view of the state of the network (or segment of
the network).
•Surveillance categories are not configured in XML like SLA categories but can
be added, edited, or deleted directly from the Web UI and are stored in the
database.
13. Service assurance through polling
• Polling via classes called monitors is the mechanism
OpenNMS uses to assure the availability of network
services.
• Various monitors are in charge of polling specific services
such as HTTP, POP, or IMAP and custom monitors can be
added for less common and more specific monitoring needs.
• It is possible to fine tune exactly how you wish to poll
services.
• For example, you can define the frequency of polling, the
number of retries before triggering an event, the amount of
time to wait for a service response, and responsetime data
can also be optionally persisted for later use.
14. SCHEDULED OUTAGES
• Another important way in which polling services can be
configured is via scheduled outages.
• As the name suggests, scheduled outages are specific
periods of times when certain devices will be brought down
for general maintenance.
• Configuring scheduled outages when you bring nodes down
for maintenance is not mandatory.
• However, when scheduled outages are not configured there
is no way of knowing the difference between a real outage
and a planned outage.
• Outages will be detected as being real thereby triggering
events and possibly alarms and notices which in turn will
affect the overall availability performance metrics.
Node List shows all existing nodes in system with the option to conduct advanced search, useful when managing lage infrastructures
Outages are created when OpenNMS cannot poll services previously provisioned.
Custom dashboards can be created under Dashboard for various roles.
Members of technical team staff can have a customized view depending on their responsibilities.
Events are numerous and often superficial, but sometimes you really want to deal with them immediatly.
Alarms can be customized to your needs,
The Notifications mechanism can send notices when such important events occur.
Assets can be tracked to help you manage your organization infrastructure.
A growing list of predefined reports and charts are available under Reports and Charts respectively.
Serveillance is essentially a customizable dashboard that provides a quick glance of problem on the network.
Distributed Map is meant to be an area to disply a map of remote pollers and advance feature of OpenNMS still under heavy development.
You can either do this by editing the configuration file $OPENNMS_HOME/etc/discovery-configuration.xml
When adding nodes, the IP address should be used, OpenNMS will then get hostnames using services such as SNMP or DNS.
It is necessery to click on Save and Restart the Distovery when done adding nodes.
All we did was to enter some IP addresses and we already have a functioning system that provides useful benefits.
Select one of the nodes under the top menu Node List.
We will have a closer look at what OpenNMS can already do.
The node page provides all information relatedto a given node such as events, alarms, past and present outages,
asset information, and more.
When the PostgreSQL service was detected to be down, both an event and an outage were created as shown in the next Recent Events and Recent Outages screenshot.
Events can be thought of as immutable records while Alarms add a mutable lifecycle to the event management subsystem of OpenNMS.
Upon close examination of the newly generated Postgres outage event,
some might be tempted to think why is an outage considered to be a
Minor event. The rationale behind this is that the severity level should
be based on the reaction to the problem and not necessarily the problem
itself. A simple service going down can be considered minor when
considering the bigger picture, the whole of network operations. Of
course, a service could just as well be mission critical to the organization
in which case it would be essential to change the severity associated
with an outage on that particular service to major or critical. The severity
levels are detailed at http://www.opennms.org/wiki/Severity
(http://www.opennms.org/wiki/Severity).
When the service is restored, a new event is generated and the outage
status changes from down to up again as shown in the second Recent
Events and Recent Outages screenshot. All this information is stored in
the database and available for future reporting needs.
Configuring users and groups is sufficient to explore the features to
come. Let's assume we have a hypothetical technical team made up of
a technical manager, two system administrators and four technicians.
Technicians look after hardware related problems and are usually the first
ones to be notified of issues. Systems administrators troubleshoot more
advanced problems on the network and are the next ones in line. The
technical manager oversees all operations including the following
configuration of users and groups.
The configuration of users is persisted in
$OPENNMS_HOME/etc/users.xml, but managing users is usually done via
the Web UI. A user is created by filling up the form accessed through
Admin | Configure Users, Groups and OnCall
Roles | Configure
Users | Add New User. A user will be created for every staff of our
hypothetical technical team, they will be named tech1, tech2, tech3,
tech4, sysadmin1, sysadmin2, and techmanager. The notification
information is used to notify users of problems so it is particularly
important to at least include a valid email.
Once completed, the list of
users should be showing under Admin | Users and Groups |
Configure Users displaying something like the following screenshot:
Availability Over the Past 24 Hours table and its SLM categories as
displayed on the Web UI home page to look like the following screenshot
by editing the XML configuration files
$OPENNMS_HOME/etc/categories.xml and
$OPENNMS_HOME/etc/displayview.xml as explained in detail at
http://www.opennms.org/wiki/Configure_Main_Window_Categories
(http://www.opennms.org/wiki/Configure_Main_Window_Categories).
Under the top menu's Surveillance link there is a default Surveillance
View which is based on the default surveillance categories. If you
choose to customize them, care must be taken to modify the default
surveillance view in $OPENNMS_HOME/etc/surveillanceviews.
xml
also. In this case, our new custom default Surveillance View (accessed
under the Surveillance top menu link) would look like the following
screenshot. The information on nodes inside the table cells is
automatically calculated.
The configuration for polling is dispersed throughout a few files. The
$OPENNMS_HOME/etc/pollerconfiguration.
xml is the main
configuration file to adjust how polling works. The file
$OPENNMS_HOME/etc/pollerconfig.
properties contains a list of
monitors that comes with OpenNMS outofthebox.
It's used only
internally and should not be modified. Finally, $OPENNMS_HOME/etc/polloutages.
xml is used to define scheduled outages for maintenance.
Polling configuration and usage scenarios will be further discussed in
following sections.
The configuration of scheduled outages can be done through the
$OPENNMS_HOME/etc/polloutages.
xml file or directly from the web UI
by clicking on Admin | Scheduled Outages. Any number of scheduled
outages can be added in a flexible manner, such as specific hosts or
groups of hosts; specific, daily, weekly or monthly time frames; and
which services will be affected. The screenshot that follows shows how
to configure a specific planned outage for host host1.example.com at
5:00 AM on 23 March, 2013 for the duration of one hour affecting all
monitoring services. This scheduled outage will permit the technical
team to bring down this server for maintenance without OpenNMS
detecting an outage and triggering events, alarms, and notifications.