4.1 监控节点的增加

++ TEST

menu = TEST

title = TEST

################web server############################

+++ TEST-web-bbs

menu = TEST-web-bbs

title = TEST网站 10.0.100.10

host = 61.160.248.10

+++ TEST-web-main

menu = TEST网站主WEB服务器

title = TEST网站主WEB服务器 10.0.100.21

host = 10.0.100.21

+++ TEST-web-admin

menu = TEST网站后台管理WEB服务器

title = TEST网站后台管理WEB服务器 10.0.100.6

host = 10.0.100.6

################double line server############################

+++ TEST-web-static

menu = TEST双线官网和网站静态文件WEB服务器

title = TEST双线官网和网站静态文件WEB服务器

#host = /cy2009/TEST/TEST-web-static-union \

# /TEST/TEST-web-static-telecom

++++ TEST-web-static-telecom

menu = TEST电信官网和网站静态文件WEB服务器

title = TEST电信官网和网站静态文件WEB服务器 10.0.100.101

host = 10.0.100.101

++++ TEST-web-static-union

menu = TEST网通官网和网站静态文件WEB服务器

title = TEST网通官网和网站静态文件WEB服务器 10.0.100.159

host = 10.0.100.159

################database############################

+++ TEST-db-master

menu = TEST网站DB主服务器

title = TEST网站DB主服务器 10.0.100.7

host = 10.0.100.7

+++ TEST-db-slave

menu = TEST网站DB从服务器

title = TEST网站DB从服务器 10.0.100.11

host = 10.0.100.11

################other############################

+++ TEST-memcached

menu = TEST-memcached服务器

title = TEST memcached和SVN服务器 10.0.100.22

host =10.0.100.22

4.2 报警设置

smokeping的alert设置有点复杂,但是却很好用,设置很灵活,考虑得很周全。它可以使用邮件进行alert,也可以直接调用外部程序进行IM的报警。在我们的监控中主要是采用邮件报警。考虑到清河东链路的实时性要求很高,我们采用了发送邮件到139邮箱,139邮箱再转发短信到手机上从而达到短信报警功能。 报警参数设置如下,哪个节点需要报警增加alerts = manyloss即可

*** Alerts ***

to = zouyunhao@aspire-tech.com,minliang@aspire-tech.com,

from = smokealert@192.168.2.14

+someloss

type = loss

pattern = >0%,*30*,>0%,*30*,>0% # in percent

comment = loss 1 packages in 30 continuous 3 times.

+manyloss type = loss

pattern = >15%,*30*,>15%,*30*,>15% # in percent

comment = loss 5 packages in 30 continuous 3 times.

+rttbad type = rtt

pattern = ==S,>50,>50 # in milliseconds

comment = For more than two consecutive 50-millisecond delay.

(1)to 表示接受所有报警的邮箱,如果需要在特定的节点报警发送到特定的邮箱

则在该节点上增加alertee = 13828466531@139.com即可。

(2)manyloss 表示30个包丢15%的情况 连续出现3次就发报警。

(3)someloss 表示30个包丢1个,连续出现3次就发送报警;rttbad表示连续两个包延迟超过50ms就发送报警。

4.3 画图设置

Smokeping默认设置中是每5分钟画一次图,每5分钟发送20个ping包。网络工程师认为5分钟发送20个ping包太少,建议改为5分钟100个。画图的颜色等也要进行相应的更改: 在Database中,step =300 pings =20 改为 step = 300 pings =100

4.4 Master/slave模式

clip_image002

从图上可以看到,slave主机会自己去检查监测点的情况(loss and rtt),并将数值提交给master主机(通过smokeping.cgi)。值得注意的是,slave并不需要config文件,每次slave提交完数据以后,会询问master它自己的配置文件是否有修改,如果有修改的话slave会进行更新。

Master配置:

*** Probes ***

+ FPing

binary = /usr/sbin/fping

sourceaddress = 0.0.0.0

*** Slaves ***

secrets=/usr/local/smokeping/etc/smokeping_secrets.dist

+10.0.100.146

display_name=10.0.100.146

location=tangshan

color=ff0000

++override

Probes.FPing.binary = /usr/sbin/fping

Probes.FPing.sourceaddress = 10.0.100.146 #唐山电信线路

+10.0.101.146

display_name=10.0.101.146

location=tangshan

color=ffff00

++override

Probes.FPing.binary = /usr/sbin/fping

Probes.FPing.sourceaddress = 10.0.101.146 #唐山网通线路

+10.0.100.93

display_name=10.0.100.93

location=zhongshan

color=ff0000

++override

Probes.FPing.binary = /usr/sbin/fping

Probes.FPing.sourceaddress = 10.0.100.93 #中山电信线路

+10.0.100.125

display_name=10.0.100.125

location=zhongshan

color=ffff00

++override

Probes.FPing.binary = /usr/sbin/fping

Probes.FPing.sourceaddress = 10.0.100.125 #中山网通线路

Slave配置:

Slave安装好软件后配置文件无需改动,只需运行一个命令即可。

Slave1配置:

#中山网通线路配置:

mkdir -p /usr/local/smokeping/cache-wt

smokeping --master-url=http://10.0.100.8/smokeping/smokeping.cgi \

--cache-dir=/usr/local/smokeping/cache-wt \

--shared-secret=/usr/local/smokeping/etc/smokeping_secrets.dist --slave-name=10.0.100.125 \

#中山电信线路配置:

mkdir -p /usr/local/smokeping/cache-dx

smokeping --master-url=http://10.0.100.8/smokeping/smokeping.cgi \

--cache-dir=/usr/local/smokeping/cache-dx \

--shared-secret=/usr/local/smokeping/etc/smokeping_secrets.dist --slave-name=10.0.100.93 \

chown -R apache.apache /usr/local/smokeping

Slave2配置:

#唐山网通线路配置:

mkdir -p /usr/local/smokeping/cache-wt

smokeping --master-url=http://10.0.100.8/smokeping/smokeping.cgi \

--cache-dir=/usr/local/smokeping/cache-wt \

--shared-secret=/usr/local/smokeping/etc/smokeping_secrets.dist --slave-name=10.0.101.146 \

#唐山电信线路配置:

mkdir -p /usr/local/smokeping/cache-dx

smokeping --master-url=http://10.0.100.8/smokeping/smokeping.cgi \

--cache-dir=/usr/local/smokeping/cache-dx \

--shared-secret=/usr/local/smokeping/etc/smokeping_secrets.dist --slave-name=10.0.100.146 \

chown -R apache.apache /usr/local/smokeping

4.5 Traceroute配置

在配置文件中加入对traceroute的支持

*** Targets ***

probe = FPing

menu = Top

title = Network Latency Grapher

+ TESTN

menu= TESTN_smokeping

title = TESTN Network Latency Grapher

menuextra = <a target='_blank' href='tr.html{HOST}' class='{CLASS}'οnclick="window.open(this.href,this.target, '>

,toolbar=no,location=no,status=no,scrollbars=no'); return false;">*traceroute*</a>