安装apache和相关软件

apt-get install apache2
apt-get install libapache2-mod-php5
apt-get install build-essential
apt-get install libgd2-xpm-dev
 
 
添加相关用户和组
useradd -m -s /bin/nologin nagios
usermod -G nagios nagios
/usr/sbin/groupadd nagcmd
/usr/sbin/usermod -a -G nagcmd nagios
/usr/sbin/usermod -a -G nagcmd www-data
 
下载nagios软件和插件
wget http://osdn.dl.sourceforge.net/sourceforge/nagios/nagios-3.2.0.tar.gz
wget http://osdn.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.11.tar.gz
 
编译安装
./configure --with-command-group=nagcmd
make all
make install
make install-init
make install-config
make install-commandmode
 
 
默认的配置文件就可以正常工作了
 
修改联系人文件中的email地址,用来接收警告邮件
vi /usr/local/nagios/etc/objects/contacts.cfg
 
配置apache网页管理nagios
make install-webconf
 
配置文件安装到了/etc/httpd/conf.d/nagios.conf
cp /etc/httpd/conf.d/nagios.conf /etc/apache2/conf.d
 
添加网页登录用户
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
 
重新加载apache
/etc/init.d/apache2 reload
 
编译安装插件
cd nagios-plugins-1.4.11
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
make install
 
 
设置开机启动
update-rc.d nagios defaults 99 20
 
校验文件
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
 
没有错误,启动nagios
/etc/init.d/nagios start
 
访问测试
http://172.17.1.202/nagios/
 
遇到错误,查看nagios日志和apache日志
 
安装nrpe,监控远程主机使用,监控本机不使用
cd nrpe-2.12
./configure --enable-ssl=no --with-nagios-user=nagios --with-nagios-group=nagios
make all
make install-plugin
make install-daemon
make install-daemon-config
 
启动nrpe不使用ssl
/usr/local/nagios/bin/nrpe -n -c /usr/local/nagios/etc/nrpe.cfg -d
 
服务器端不用启动nrpe服务
 
测试check_nrpe
/usr/local/nagios/libexec/check_nrpe -n -H localhost
NRPE v2.12
 
================安装发送邮件,可选择===========
apt-get install mailx
apt-get install postfix
vi /usr/local/nagios/etc/objects/commands.cfg
%s/bin\/mail/user\/bin\/mail/g
修改/bin/mail为/usr/bin/mail
 最后没有使用上面方法
使用telnet方式发送邮件
脚本内容
用户名和密码要转成base64 http://maclife.net/tools/base64/
#!/bin/bash
IP="smtp.163.com"
PORT="25"
USER=" test@163.com"
USER64="dGVzdEAxNjMuY29t"
PASD64="eDNsSo="
TOUSER="test11 @qq.com"
TOUSER1= test22@qq.com
Date=`date +%Y-%m-%d_%T`
#
(sleep 1;echo "helo hello";sleep 1;echo "auth login";sleep 1;echo "$USER64";sleep 1;echo "$PASD64";sleep 1;echo "mail from: <$USER>";sleep 1;echo "rcpt to: <$TOUSER>";sleep 1;echo "data";sleep 1;echo -e "subject: $1\n";sleep 1;echo -e "nagios:\nType:$1\nHost:$2\nState:$3\nAddress:$4\nInfo:$5\nDate/Time:$6\n$Date";echo ".";echo "quit") | telnet $IP $PORT
(sleep 1;echo "helo hello";sleep 1;echo "auth login";sleep 1;echo "$USER64";sleep 1;echo "$PASD64";sleep 1;echo "mail from: <$USER>";sleep 1;echo "rcpt to: <$TOUSER1>";sleep 1;echo "data";sleep 1;echo -e "subject: $1\n";sleep 1;echo -e "nagios:\nType:$1\nHost:$2\nState:$3\nAddress:$4\nInfo:$5\nDate/Time:$6\n$Date";echo ".";echo "quit") | telnet $IP $PORT
 
================================================
 
被监控端安装
 
useradd -m -s /bin/nologin nagios
 
cd nagios-plugins-1.4.11
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
make install
 
chown nagios.nagios -R nagios
 
cd nrpe-2.12
 
./configure --enable-ssl=no --with-nagios-user=nagios --with-nagios-group=nagios
make all
make install-plugin
make install-daemon
make install-daemon-config
 
启动nrpe不使用ssl
/usr/local/nagios/bin/nrpe -n -c /usr/local/nagios/etc/nrpe.cfg -d
 
测试check_nrpe
/usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.12
 
 
 
在监控主机上测试
 
/usr/local/nagios/libexec/check_nrpe -n -H 172.17.1.201
拒绝
 
首先在被监控主机上,添加允许访问的主机ip
vi nrpe.cfg
allowed_hosts=127.0.0.1,172.17.1.202
 
/usr/local/nagios/libexec/check_nrpe -n -H 172.17.1.201
NRPE v2.12
 
配置监控服务器
 
简单说明一下原理
监控服务器通过nrpe 向被监控服务器的nrpe发送命令,然后被监控端的nrpe查找nrpe.cfg文件,找到匹配的命令。
所以被监控端,要安装相关的插件,配置nrpe.cfg
有的插件不需要在被控端得nrpe.cfg中配置,如,check_ping check_http
 
校验文件
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
 
配置监控
 
被监控机
下载脚本,放到libexec下
 
测试一下命令
check_mem.pl
./check_mem.pl -w 95,60 -c 120,80 -v
 
添加要监控端要执行的命令,定义命令的名字和插件位置及相关参数,如果想使用变量,编译nrpe的时候要指定dont_blame_nrpe,dont_blame_nrpe=1可以使用变量
vi nrpe.cfg
command[check_df]=/usr/local/nagios/libexec/check_disk -w 20% -c 10%
command[check_mem]=/usr/local/nagios/libexec/check_mem.pl -w 90,30 -c 95,50 -v
 
查看指定端口的所有连接数
command[check_netstat]=/usr/local/nagios/libexec/check_netstat.pl -p '>'$ARG1$ -w $ARG2$ -c $ARG3$
 
查看指定端口的ESTABLISHED的连接数
command[check_netstat_ESTABLISHED]=/usr/local/nagios/libexec/check_netstat.pl -p '>'$ARG1$ -w $ARG2$ -c $ARG3$ -e
command[check_netstat]=/usr/local/nagios/libexec/check_netstat.pl -p 80 -w 400 -c 800
command[check_netstat_ESTABLISHED]=/usr/local/nagios/libexec/check_netstat.pl -p 80 -w 200 -c 500 -e
 
查找访问的页面中指定的字符串,如果存在ok,响应时间警告值11,临界值21
command[check_http]=/usr/local/nagios/libexec/check_http -H $ARG1$ -r $ARG2$ -w $ARG3$ -c $ARG4$
 
检测用户进程数
command[check_user_procs]=/usr/local/nagios/libexec/check_procs -u $ARG1$
 
检测dns状态
command[check_bind]=/usr/local/nagios/libexec/check_bind.sh -p /var/run/bind/run/ -n named.172.pid -s /etc/bind
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
 
 
监控机
 
添加一个主机监控文件和一个服务监控文件
vi nagios.cfg
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
cfg_file=/usr/local/nagios/etc/objects/services.cfg
command_check_interval=10s
 
在命令文件中添加check_nrpe命令的定义
vi commands.cfg
define command{
command_name     check_nrpe
command_line     $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
define command{
command_name     check_bind
command_line     /usr/local/nagios/libexec/check_bind.sh -p /var/run/bind/run/ -n named.172.pid -s /etc/bind
}
 
define command{
command_name     check_web
command_line     $USER1$check_http -H $HOSTADDRESS$ -r $ARG1$ -w $ARG2$ -c $ARG3$
}
check_http -H www.dledu.com -r 2008 -w 1 -c 2
 
配置发送邮件===================================================
自己写的发送邮件脚本
cp mail_host.sh mail_ser.sh /bin/
chown nagios /bin/mail_*
chmod u+x /bin/mail_*
 
define command{
        command_name    notify-host-by-email
        command_line    /bin/mail_host.sh "$NOTIFICATIONTYPE$" "$HOSTNAME$" "$HOSTSTATE$" "$HOSTADDRESS$" "$HOS
TOUTPUT$"
}
 
define command{
        command_name    notify-service-by-email
        command_line    /bin/mail_ser.sh "$NOTIFICATIONTYPE$" "$SERVICEDESC$" "$HOSTALIAS$" "$HOSTADDRESS$" "$S
ERVICESTATE$" "$LONGDATETIME$" "$SERVICEOUTPUT$"
}
 
 
配置联系人文件加入自己的邮箱地址
vi contacts.cfg
define contact{
        contact_name                    nagiosadmin            
#        use                             generic-contact        
        alias                           Nagios Admin          
        host_notifications_enabled      1
        service_notifications_enabled   1
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    w,u,c,r
        host_notification_options       d,u,r
        service_notification_commands   notify-service-by-email
        host_notification_commands      notify-host-by-email
        email                           dlsxw@qq.com           
        }
==========================================================================
=========================每添加一个主机要配置一次===========================
定义要监控的主机和主机的服务
 
vi hosts.cf
define host{
host_name        test-server
alias            test server
address          172.17.1.201
check_command    check-host-alive
max_check_attempts       5
notification_interval    10
notification_period      24x7
notification_options     d,u,r
contacts                 nagiosadmin
}
 
===============每个主机要检测的服务=========================
主要监控的服务有,prcess,load,disk,http
 
vi service.cfg
 
define service{
host_name test-server
service_description      memory
check_command            check_nrpe!check_mem!110,50!150,80
check_period     24x7
max_check_attempts       4                 发生4次不能访问,认定发生故障
normal_check_interval    3             故障累计3次,报警
retry_check_interval     2               告警之后每两分钟再进行一次检查
notification_interval    10                如果10分钟之后仍然没有恢复,再发送一次告警
notification_period      24x7
notification_options     w,u,c,r
contacts                 nagiosadmin
}
define service{
host_name test-server
service_description      disk
check_command            check_nrpe!check_df
check_period     24x7
max_check_attempts       4
normal_check_interval    3
retry_check_interval     2
notification_interval    10
notification_period      24x7
notification_options     w,u,c,r
contacts                 nagiosadmin
}
define service{
host_name                test-server
service_description      load
check_command            check_nrpe!check_load
check_period     24x7
max_check_attempts       4
normal_check_interval    3
retry_check_interval     2
notification_interval    10
notification_period      24x7
notification_options     w,u,c,r
contacts                 nagiosadmin
}
define service{
host_name                test-server
service_description      netstat
check_command            check_nrpe!check_netstat
check_period     24x7
max_check_attempts       4
normal_check_interval    3
retry_check_interval     2
notification_interval    10
notification_period      24x7
notification_options     w,u,c,r
contacts                 nagiosadmin
}
define service{
host_name                test-server
service_description      netstat_ESTABLISHED
check_command            check_nrpe!check_netstat_ESTABLISHED
check_period     24x7
max_check_attempts       4
normal_check_interval    3
retry_check_interval     2
notification_interval    10
notification_period      24x7
notification_options     w,u,c,r
contacts                 nagiosadmin
}
define service{
host_name                test-server
service_description      www.dledu.com
check_command            check_http
check_period     24x7
max_check_attempts       4
normal_check_interval    1
retry_check_interval     1
notification_interval    10
notification_period      24x7
notification_options     w,u,c,r
contacts                 nagiosadmin
}
 
define service{
host_name                test-server
service_description      procs_apache
check_command            check_nrpe!check_apache_procs
check_period     24x7
max_check_attempts       4
normal_check_interval    3
retry_check_interval     2
notification_interval    10
notification_period      24x7
notification_options     w,u,c,r
contacts                 nagiosadmin
}
define service{
host_name                test-server
service_description      bind_status
check_command            check_bind
check_period     24x7
max_check_attempts       4
normal_check_interval    3
retry_check_interval     2
notification_interval    10
notification_period      24x7
notification_options     w,u,c,r
contacts                 nagiosadmin
}
define service{
host_name                test-server
service_description      procs_total
check_command            check_nrpe!check_total_procs
check_period     24x7
max_check_attempts       4
normal_check_interval    3
retry_check_interval     2
notification_interval    10
notification_period      24x7
notification_options     w,u,c,r
contacts                 nagiosadmin
}
define service{
host_name                test-server
service_description      procs_zombie
check_command            check_nrpe!check_zombie_procs
check_period     24x7
max_check_attempts       4
normal_check_interval    3
retry_check_interval     2
notification_interval    10
notification_period      24x7
notification_options     w,u,c,r
contacts                 nagiosadmin
}
 
 
windows被监控机
先安装NSClient++-0.3.7-Win32.msi,安装的时候选择前3个模块,指定允许监控的nagios服务器地址
 
 
nagios服务器:
定义要监控的项目
修改/usr/local/nagios/etc/services.cfg文件;
cfg_file=/usr/local/nagios/etc/objects/windows.cfg
 
关于check_nt的用法可以使用下面命令查看帮助:
# /usr/local/nagios/libexec/check_nt -h
下面给出一些常用的参数:
1)监控windows服务器运行的时间
check_command            check_nt!UPTIME
2)监控Windows服务器的CPU负载,如果5分钟超过80%则是warning,如果5分钟超过90%则是critical
check_command            check_nt!CPULOAD!-l 5,80,90
3)监控Windows服务器的内存使用情况,如果超过了80%则是warning,如果超过90%则是critical.
check_command            check_nt!MEMUSE!-w 80 -c 90
4)监控Windows服务器C:\盘的使用情况,如果超过80%已经使用则是warning,超过90%则是critical
check_command            check_nt!USEDDISKSPACE!-l c -w 80 -c 90
注:-l后面接的参数用来指定盘符
5)监控Windows服务器D:\盘的使用情况,如果超过80%已经使用则是warning,超过90%则是critical
check_command            check_nt!USEDDISKSPACE!-l d -w 80 -c 90
6)监控Windows服务器的W3SVC服务的状态,如果服务停止了,则是critical
check_command            check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
7)监控Windows服务器的Explorer.exe进程的状态,如果进程停止了,则是critical
check_command            check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe