Set Up Nagios to Monitor Your Infrastructure

7667 VIEWS

· · ·

There are many different monitoring tools available that can help you to monitor your infrastructure. Some are cloud-based, such as New Relic; some are closed-source, like Solar Winds, and some are open-source like Nagios. The latter two allow you to run monitoring solutions on-site or within your own private network, which may be advantageous if you need to adhere to certain PCI compliance regulations or data protection standards. In this article, we’re going to discuss one of the most-loved and most-hated monitoring tools out there, Nagios.

Objectives

  1. Set up Nagios on Ubuntu 16.04.2
  2. Design Considerations
  3. How to monitor remote systems using Nagios

Set up Nagios on Ubuntu 16.04.2

For the purposes of this article, I’ve used Digital Ocean to deploy a fresh ‘droplet,’ or virtual machine running Ubuntu 16.04.2. These setup instructions will be very similar for Debian as well. Nagios can be run on many other OSs, including CentOS, but I find the package selection to be superior on Debian-based OSs when it comes to Nagios.

I would suggest running apt-get update on your VM before proceeding to the next step. Once you’ve done that, run the following: apt-get install nagios3 nagios-nrpe-server nagios-plugins

During the package installation, you’ll be prompted to set a password to log into the Nagios web interface (see below):

Don’t worry about this too much, as you can change it later on (it’s set in a .htpasswd file).

After this point, aptitude will go through and install all the requested packages, along with dependencies. Once this is complete, you should be able to browse to the IP/Hostname (assuming you have DNS configured) of your VM with nagios3 appended on the end—like so: http:///nagios3/ . At this point, you should see the following:

You should also be able to see monitoring of ‘localhost’ working correctly, which indicates Nagios has successfully started and is monitoring ‘a server.’ You can see this by clicking on services down the left hand side, at which point you should see something similar to the following:

Design Considerations

Getting Nagios set up and running is only part of the fun. The problem many people encounter (including myself when I first started out with Nagios) is that it’s incredibly easy to set it up badly and get yourself easily confused. This tends to happen because Nagios will parse any configuration within the /etc/nagios3/conf.d directory regardless of how the file is named or where the different configuration elements or stanzas are set.

With the above in mind, it’s worth spending a little bit of time deciding on how you want to organise your Nagios config. I’ve had lots of success with the following directory structure— although you may find a way that works better for you, so please don’t take my word for it!

I tend to leave the base configurations that the Ubuntu Nagios packages come with in place as these are well laid out from the outset, but I then create folders within the conf.d directory relative to the customer environment I’m monitoring (again, you may only have one environment to monitor, so you may choose to stick with one folder). Then underneath that folder, I create a hosts folder which in turn will have a configuration file for each host I wish to monitor. Likewise, I’ll put a hostgroups.cfg configuration in the environment folder to house all my host groups. With this structure in mind, it would look something like this:

/etc/nagios3/conf.d
/
/hosts
hostgroups.cfg

Having a directory structure like this in place from the get-go is useful, as it allows you to organise things efficiently, rather than having config all in one big file!

All the other files in the conf.d directory are fairly self-explanatory, but do require further reading of the documentation to make absolute sense.

How to monitor remote systems using Nagios

There are a number of ways to monitor remote systems from within Nagios, but the most common are using the NRPE (Nagios Remote Plugin Executor — check_nrpe), and by simply using SSH commands (check_ssh). Personally, I’ve always used the NRPE daemon, as it allows the Nagios server to communicate with the server it’s monitoring over TCP port 5666 with SSL to execute commands on the remote system, and then take the output of that command and use it within Nagios for monitoring statuses.

The basic monitoring checks are painfully simple to write and rely on bash exit codes which ultimately determine how the check status is displayed within the Nagios GUI, and thus how alerts are generated (or not). A table of the commonly used exit codes is shown below:

Specifically, exit code 0 (OK) will show a green status in Nagios, exit code 1 (WARNING) will show a yellow status, and 2 (CRITICAL) will show a red status. This basic traffic light system is all you need to know what’s going on.

The localhost server which is currently being monitored in Nagios is a good example to show how to go about monitoring a server. To do this, let’s take a look at the Nagios config that carries out this monitoring, which can be found at: /etc/nagios3/conf.d/localhost_nagios2.cfg
Within this file, you’ll notice a host stanza which defines the host being monitored:

define host{
        use                     generic-host            ; Name of host template to use
        host_name               localhost
        alias                   localhost
        address                 127.0.0.1
        }

In addition to this, you’ll notice that there are multiple service definitions. These link to the services being monitored that you saw on the screen earlier when we first logged into the Nagios GUI.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       localhost
        service_description             Disk Space
        check_command                   check_all_disks!20%!10%
        }

In the above example, there is a host stanza which tells Nagios to monitor localhost. The service stanza tells Nagios to run the ‘check_all_disks’ script on the localhost server to check disk space. The numbers are thresholds that the check will alert on. The way these work depends on the script you’re using, so it’s worth reading the documentation.

If you choose to monitor using NRPE or SSH, then you’ll be wanting to change the service stanza to look something like this:

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       localhost
        service_description             Disk Space
        check_command                   check_nrpe!check_all_disks!20%!10%
        }

The above amendment would tell Nagios to execute the ‘check_all_disks’ check by connecting to the remote server over NRPE.

Conclusion

It takes patience, along with some thought on how you’re going to design your Nagios solution from the outset, but it’s well worth doing. Nagios is an incredibly powerful monitoring tool, and despite other free and paid offerings available, it still regularly comes out on top—and it’s still in use today after many years.

Resources

  1. Nagios return codes (exit codes): https://nagios-plugins.org/doc/guidelines.html#AEN78
  2. Nagios documentation: https://www.nagios.org/documentation/
  3. How to monitor a Unix/Linux machine: https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/monitoring-linux.html

Keith Rogers is an IT professional with over 10 years’ experience in modern development practices. Currently he works for a broadcasting organization in the DevOps space with a focus on automation. Keith is a regular contributor at Fixate IO.


Discussion

Leave a Comment

Your email address will not be published. Required fields are marked *

Menu
Skip to toolbar