From ZhoubaWiki
Jump to navigation Jump to search

Introduction to Setting up event handlers triggering on remote servers

For illustration eh_restart_service event handler installation is described

Monitoring server part

Having check_remote set up this way:

define command {
	command_name check_remote
	command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p $_HOSTPORT$ -u -c $ARG1$ -t 120

simple new service definition row is to be added like this:

# generic response code template
define service{
    name			response-code-template
    use				generic-service-charted
    check_interval		0.25
    retry_interval		0.25
    max_check_attempts		3
    flap_detection_enabled	0
    register			0
# trackers
define service{
    use				response-code-template
    hostgroup_name		trackers
    service_description		Check HTTPS success response code
    check_command		check_https_success_response!S!/
    event_handler	    check_remote!restart_service_tracker

Note that max_check_attempts directive has to be set to have a value of more than 1, ideally at least 3. It is so because event handlers are executed on soft states only (1/3, 2/3) - to avoid unnecessary notifications.

Remote server part

1. Copy the event handler script (can be in any executable form) the the remote server's plugin directory (typically /usr/lib/nagios/plugins)

2. Assign it as an NRPE plugin under tha same name as check_nrpe's parameter (in this case restart_service_tracker) into nrpe.cfg like this:

command[restart_service_tracker]=sudo /usr/lib/nagios/plugins/eh_restart_service -s tracker
Note the usage of sudo prefix - it is so due to the necessity of broader privileges for services restarting.

3. Don't forget to restart the daemon afterwards

/etc/init.d/nagios-nrpe-server restart

4. Add the nagios user privilege to execute nagios plugins using sudo. /etc/sudoers snippet example follows:

# Allow members of group sudo to execute any command
nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/

5. Now you can issue test command to verify everything on remote server is set up OK

/usr/lib/nagios/plugins/check_nrpe -H localhost -c restart_service_tracker
This exact event handler outputs 'OK', exits with a code of 0 and writes log entry into /icinga/eh_restart_service.log. Consult source code of your event handler script to evaluate it works OK;-)

6. If still unsure it works ok, manually stop the service and issue the command of point 5 again. Then see service status and calm down.