因为咱们使用的CDN的作用下载客户端使用,需要统计CDN的尝试下载IP数和下载成功IP数做的一个CDN统计脚本

日志格式:#Fields: date time time-taken c-ip filesize s-ip s-port sc-status sc-bytes cs-method cs-uri-stem - rs-duration rs-bytes c-referrer c-user-agent customer-id x-pn_custom-1
2013-12-08 00:10:13 0 111.8.3.146 95877816 183.61.142.77 80 TCP_HIT/304 264 GET http://wpc.b078.i1.cdndelivery.com/00B078/myfile/DFPlayerInstaller.exe - 0 354 "-" "Mozilla/5.0 Gecko/20100115 Firefox/3.6" 45176 "-"


思路:1.在mcc.pacnetcdn.com设置把CDN中的日志传到FTP的目录下/home/cdnlog/logs/

     2.统计CDN的尝试下载IP数打印第4列并去重排列

     3.下载成功IP数第5列(软件大小)大于第9列(实际下载大小)为成功并去重排列,当第5列(软件大小)不大于第9列(实际下载大小)为下载不成功。

     4.排除CND服务器的更新。


实际脚本如下

#/bin/bash
date=`date -d yesterday +%Y%m%d`
rm -rf /home/cdnlog/logbak/*
cp /home/cdnlog/logs/wpc_B078_"$date"* /home/cdnlog/logbak/.
logsum=`ls -l /home/cdnlog/logbak/ |tail -1 |awk -F"_" '{print $4}' |cut -c3-4`

#Decompression redirects
cd /home/cdnlog/logbak/
for i in `seq $logsum `
do
a=`printf "%02d\n" $i`
gunzip wpc_B078\_"$date"_\0\0$a.log.gz
cat wpc_B078\_"$date"_\0\0$a.log >>/home/cdnlog/download/$date
done

grep="211.157.145.50|180.240.184.*|192.16.[0-9]\>.*|192.16.[0-5][0-9]\>.*|192.16.6[0-3].*|192.30.[0-9]\>.*|192.30.[0-2][0-9]\>.*|192.30.3[0-1].*|198.7.1[6-9]\>.*|192.7.2[0-9]\>.*|192.7.3[0-1].*|5.104.6[4-9].*|5.104.7[0-1].*|46.22.6[4-9].*|46.22.7[0-9].*|68.232.3[2-9].*|68.232.4[0-7].*|72.21.8[0-9].*|72.21.9[0-5].*|93.184.20[8-9].*|93.184.21[0-9].*|93.184.22[0-3].*|108.161.2[4-5][0-9].*|110.232.17[6-9].*|117.18.23[2-9].*|117.103.183.*|200.201.194.1[6-9]\>|200.201.194.2[0-9]\>|200.201.194.3[0-1]|200.201.213.4[8-9].*|200.201.213.5[0-9].*|200.201.213.6[0-3].*|203.114.3[2-9].*|203.114.[4-5][0-9].*|203.114.6[0-3].*|101.226.16[6-9].*|180.153.236.*|182.118.2[0-2]\>.*|182.118.2[5-6]\>.*|182.118.35.*|182.118.5[5-7].*|182.118.69.*|61.55.185.*|183.60.213.105|183.60.214.19|183.60.215.50|101.227.4.*|42.120.16[0-1].*|42.156.13[6-9].*|112.125.20.41|180.210.243.*|199.80.54.*|203.208.60.*|66.249.6[4-6].*|66.249.7[3-5].*"

#Statistics and statistical success try to download the total number
downloadsum=`cat /home/cdnlog/download/"$date"|grep -v "#"|awk '{print $4}' |sort |uniq -c |grep -E -v "$grep"|wc -l`
downloadsuccess=`cat /home/cdnlog/download/"$date" |grep -v "#" |awk '$5 <= $9 {print $0}'|grep /200 |awk '{print $4}'|sort|uniq -c |grep -E -v "$grep"|wc -l`
echo $date $ipsum $downloadsum $downloadsuccess >>/home/cdnlog/download/download.txt

#delete the old log  30 days ago
find /home/cdnlog/logs/ -name "*.log.gz" -mtime +30|xargs rm -f
find /home/cdnlog/download -name "20*" -mtime +30|xargs rm -f



欢迎大家讨论。