老大布置了任务,要统计每天的故障情况,计算服务稳定指标
根据监控记录的log,来进行统计

log的格式大概如下:
InBlock.gifcorp_resin_a:10.11.15.35:6805 is down! 2008-08-02-20:55:16
InBlock.gifmx7.cmail.sogou.com:10.11.15.34:6805 is down! 2008-08-02-20:55:26
InBlock.gifmx5.cmail.sogou.com:10.11.15.46:6805 is down! 2008-08-02-20:55:26
InBlock.gifmx10.cmail.sogou.com:192.168.95.143:6805 is down! 2008-08-02-20:55:26
InBlock.gifcorp_resin_d:192.168.132.189:6805 is down! 2008-08-02-20:55:26
InBlock.gifcorp_resin_d:192.168.132.89:6805 is down! 2008-08-02-20:55:26
InBlock.gifcorp_resin_a:10.11.15.47:6805 is down! 2008-08-02-20:55:26
InBlock.gifcorp_resin_d:192.168.131.164:6805 is down! 2008-08-02-20:55:26
InBlock.gifcorp_resin_a:10.11.15.20:6805 is down! 2008-08-02-20:55:26
InBlock.gifmx10.cmail.sogou.com:192.168.95.143:6805 is down! 2008-08-02-20:55:37

监控脚本每分钟运行一次,因此可以认为出现一次log就算一分钟故障时间

采用perl来写,没啥别的目的,就是练手,日志格式为主机名,ip,端口 日期,时间
#!/usr/bin/perl
        my $web=0;
        my $pop3=0;
        my $smtp=0;
        my $master=0;
        my $slave=0;
        my $resin=0;
        my($curlogfile) =@ARGV;
    open(FILE,$curlogfile);

while(<FILE>){
        if ($_=~/down/){
        $totaltimes++;
        }
        chomp();
        @items=split(/\ /);
        my $service=$items[0];
        my $date=$items[3];
        @newitem1=split(/\:/,$service);
        $ip=$newitem1[1];
#        print $ip."\n";
#        sleep (5);
        my $port=$newitem1[2];
        @newitem2=split(/\-/,$date);
        my $time=$newitem2[3];
        if ($port eq "2000"){
            $master=$master+1;
        }
        if ($port eq "9002"){
            $slave=$slave+1;
        }
        if ($port eq "80"){
            $web=$web+1;
        }
        if ($port eq "25"){
            $smtp=$smtp+1;
        }
        if ($port eq "110"){
            $pop3=$pop3+1;
        }
        if ($port=~/6802|6803|6804|6805/){
            $resin=$resin+1;
        }
        if ( defined( $totalip{$ip} ) ){
            $totalip{$ip}=$totalip{$ip}+1;
        }else{
            $totalip{$ip}=1;

        }
#        print $ip."    ".$port."    ".$port{$ip}."\n";

}

close(FILE);

print "总故障次数:".$totaltimes."\n";
if ($web gt 0){
print "WEB故障次数:".$web."\n";
}
if ($pop3 gt 0){
print "POP3故障次数:".$pop3."\n";
}
if ($smtp gt 0){
print "SMTP故障次数:".$smtp."\n";
}
if ($resin gt 0){
print "RESIN故障次数:".$resin."\n";
}
if ($master gt 0){
print "MASTER故障次数:".$master."\n";
}
if ($slave gt 0){
print "SLAVE故障次数:".$slave."\n";
}
print "故障ip:"."      "."故障次数\n";
foreach $key (sort keys %totalip) {
  $num = $totalip{$key};
  print $key."  ".$num."\n";
}


最后统计的结果如下:

coolerfeng@mail:~/log$ ./log.pl net-2008-08-02.txt
总故障次数:642
WEB故障次数:1
POP3故障次数:1
RESIN故障次数:39
MASTER故障次数:601
故障ip:      故障次数
10.10.71.50  1
10.10.71.92  2
10.11.15.20  2
10.11.15.34  2
10.11.15.35  2
10.11.15.46  2
10.11.15.47  2
192.168.131.164  9
192.168.131.76  1
192.168.132.189  8
192.168.132.89  7
192.168.41.194  601
192.168.95.143  3

这台MASTER坏的时间太长了,严重影响了稳定性,嘿嘿