Overview

Nagios Core Server combined with the Nagios Remote Plugin Executor (NRPE) Server allows Nagios to execute remote plugins on client servers. This post discusses automated ways to override the default plugin tests and parameters.

Assumptions

This post assumes you have the following already up and running:

  • Nagios Core Server
  • Consul Server(s)
  • Client Servers running:
    • consul
    • consul-template
    • nagios-nrpe-server
    • Ingress Port 5666 open to Nagios Core Server

Consul Keys

I organize my consul node IDs by <role-name>.<service-name>.<datacenter-name>.  This allows me to set three levels of override hierarchies. I also organize my Consul key/value store by <datacenter|services|hosts>/<service-name|hostname>/<role|nil>/<path-to-override-file>.  For instance, a client server with Consul node-id of server1-master.mysql.east1 could have Consul keys at any of these locations (from least to most specific):

  • east1/etc/nagios/nrpe.d/overrides
  • services/mysql/master/etc/nagios/nrpe.d/overrides
  • hosts/server1-master.mysql.east1/etc/nagios/nrpe.d/overrides

Consul-Template

NRPE reads config files top-down so it’s possible to have duplicate plugin definitions in a configuration file. The last definition of each plugin in the file is executed and the definitions above it are ignored. Using this characteristic, we will configure our Consul-Template to read from the least to most specific keys in Consul. Namely datacenter, service-name and finally hostname.

Here’s the configuration file.

template {
  source = "/etc/consul-template.d/templates/nrpe_server.ctmpl"     
  destination = "/etc/nagios/nrpe.d/overrides.cfg"
  command = "service nagios-nrpe-server restart"
}

Here’s our template file.

{{ with $id := node }} 
# east1/etc/nagios/nrpe.d/custom.cfg {{ if keyExists "east1/etc/nagios/nrpe.d/overrides" }}{{ key "east1/etc/nagios/nrpe.d/overrides" }}{{ end }}

# {{printf "services/%s/%s/etc/nagios/nrpe.d/overrides" (index ($id.Node.Node | split ".") 1) ((index ($id.Node.Node | split ".") 0) | regexReplaceAll "[a-z0-9]+-(.*)" "$1")}} {{ if keyExists (printf "services/%s/%s/etc/nagios/nrpe.d/overrides" (index ($id.Node.Node | split ".") 1) ((index ($id.Node.Node | split ".") 0) | regexReplaceAll "[a-z0-9]+-(.*)" "$1")) }}{{ printf "services/%s/%s/etc/nagios/nrpe.d/overrides" (index ($id.Node.Node | split ".") 1) ((index ($id.Node.Node | split ".") 0) | regexReplaceAll "[a-z0-9]+-(.*)" "$1") | key }}{{ end }}

# {{printf "hosts/%s/etc/nagios/nrpe.d/overrides" ($id.Node.Node)}} {{ if keyExists (printf "hosts/%s/etc/nagios/nrpe.d/overrides" ($id.Node.Node)) }}{{ printf "hosts/%s/etc/nagios/nrpe.d/overrides" ($id.Node.Node) | key }}{{ end }}
{{ end }}

Consul Key/Values

Now for any tests you want to override, just enter a key and value for the level you want to override. For example, the NRPE check_load default settings are

command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 

Now, if you want to override the CPU load test for all master MySQL servers, by raising the Warning threshold from 15,10,5 to 20,15,10, the key in Consul is

services/mysql/master/etc/nagios/nrpe.d/overrides

and the value could be

command[check_load]=/usr/lib/nagios/plugins/check_load -w 20,15,10 -c 30,25,20

If you only wanted to set the value on one specific client server, you could add a key

hosts/server1-master.mysql.east1/etc/nagios/nrpe.d/overrides

Upon adding, changing or deleting one of the Consul keys, the affected servers will pull down the override values, write it to a file called overrides.conf and reload nagios-nrpe-server. All new tests will use the overridden values.

That’s It

I use this to manage custom settings for hundreds of servers. There are some cons to using this method.

  • After awhile there can be a lot of overrides scattered around the Consul key store and it can be confusing as to what servers have overrides. You can query the KV store in code and filter on only keys with nagios/nrpe.d/overrides in the path.
  • Changing the datacenter level overrides will affect all client servers in the entire datacenter which may or may not be what you want. Be very careful changing these settings.
Tagged on: