先分析下需求,之前为ATS 的proxy.pac 文件添加IP 都是要手工添加上去的,那样太麻烦,而且容易出错导致ATS 不能正常工作,于是写了这个自动获取IP的脚本来搜集IP。
工作原理:设置好代理后,访问优酷,土豆等视频的时候,ATS 会将视频url记录到cacheurl.log 文件中,格式如下:
1 20130320.10h35m51s Adding pattern/replacement pair: 'http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digi t:]]{1,3}[^&]*/f4v/.*id=tudou.itemid\=([0-9]*).*' -> 'http://www.tudou.com/$1'
2 20130320.10h35m51s Adding pattern/replacement pair: 'http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digi t:]]{1,3}[^&]*/flv/.*id=tudou.itemid\=([0-9]*).*' -> 'http://www.tudou.com/$1'
3 20130320.10h35m51s Adding pattern/replacement pair: 'http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digi t:]]{1,3}/youku/.*/(.*-.*-.*-.*-[^?]*)' -> 'http://www.youku.com/$1'
4 20130320.10h35m51s Adding pattern/replacement pair: 'http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digi t:]]{1,3}/youku/.*/(.*-.*-.*-.*-.*)' -> 'http://www.youku.com/$1'
5 20130320.10h35m51s Adding pattern/replacement pair: 'http://[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digi t:]]{1,3}/sohu/[0-9]*/[0-9]*/[0-9]*/(.*).mp4?key=.*' -> 'http://tv.sohu.com/$1.mp4'
6 20130320.10h37m04s Rewriting cache URL for http://101.226.245.75/youku/6772A5707643182BA3728D4724/0300020100510B8A4D6AF00 742D2DA77BDF6F5-18FE-4AC5-6C87-2CD878064B12.flv to http://www.youku.com/0300020100510B8A4D6AF00742D2DA77BDF6F5-18FE-4AC5- 6C87-2CD878064B12.flv
7 20130320.10h37m06s Rewriting cache URL for http://113.105.227.11/youku/69746EE083C4581DA95E9E61C0/03000807005141928321D60 3BAF2B19EFEFB4D-7853-3901-8D53-6DAE474A3FF7.mp4?start=85 to http://www.youku.com/03000807005141928321D603BAF2B19EFEFB4D-7 853-3901-8D53-6DAE474A3FF7.mp4
8 20130320.10h37m07s Rewriting cache URL for http://121.9.202.11/youku/6773BA488D44D749F490B3962/03000201005148F00667A4050A 8C135F42E16F-C0BE-EEE0-9F09-F8A6035730E9.flv to http://www.youku.com/03000201005148F00667A4050A8C135F42E16F-C0BE-EEE0-9F0 9-F8A6035730E9.flv
可以看到前5行不是我们要的,我们要的是:http://121.9.202.11/youku/6773BA488D44D749F490B3962/03000201005148F00667A4050A 8C135F42E16F-C0BE-EEE0-9F09-F8A6035730E9.flv
抓下这个url 之后,我们要进一步处理,将IP 抽取出来,然后处理成C 段例如,121.9.202.0 255.255.255.0 。
紧接着,要将这个IP 与proxy.pac 这个文件里面原有的IP 进行比较,proxy.pac 的格式如下:
if (isInNet(host, "183.60.162.0", "255.255.255.0")) {return "PROXY 202.192.72.198:80; DIRECT";}
if (isInNet(host, "122.72.10.0", "255.255.255.0")) {return "PROXY 202.192.72.198:80; DIRECT";}
if (isInNet(host, "218.61.209.0", "255.255.255.0")) {return "PROXY 202.192.72.198:80; DIRECT";}
if (isInNet(host, "59.63.173.0", "255.255.255.0")) {return "PROXY 202.192.72.198:80; DIRECT";}
要做的工作就是将这个IP 抽取出来,然后与从url 抓下来的IP 进行比较,如果proxy.pac 里面没有,就把这个IP 按照格式写入到pac 文件中去。
脚本如下:
#!/bin/bash
sed '1,5d' /usr/local/var/log/trafficserver/cacheurl.log > /tmp/cacheurl.bk
# Get the new ip from cacheurl.log
cat /tmp/cacheurl.bk | cut -d' ' -f6 | cut -d'/' -f3 |sed -n 's/[0-9]\{1,3\}$/0/p' |sort |uniq > /tmp/newip
# Get the old ip from pac
sed -n '/if[[:space:]](isInNet(host/p' /usr/local/etc/trafficserver/proxy.pac |cut -d' ' -f 3|sed -n 's/"//p' |sed -n 's/",//p' > /tmp/oldip
# Differentiate between newip from oldip
grep -v -f /tmp/oldip /tmp/newip > /tmp/neededip
echo "The ip needed have been writeen to /tmp/neededip , you can have a look at it as you wish"
# Write the ip to pac
sed -i '$d' /usr/local/etc/trafficserver/proxy.pac
while read -r line
do
echo $line > /tmp/ipcache
IP=`cat /tmp/ipcache`
echo 'if (isInNet(host, "'$IP'", "255.255.255.0")) {return "PROXY x.x.x.x:1314; DIRECT";} ' >> /usr/local/etc/trafficserver/proxy.pac
done < /tmp/neededip
echo 'return "DIRECT";}' >> /usr/local/etc/trafficserver/proxy.pac
执行结果:
会抓下proxy.pac 的IP (oldip)
抓下cacheurl.log里面的IP (newip)
对比之后,将需要的IP 记录下来 (neededip)
将我们要的IP 按照固定格式填写到proxy.pac 文件里面
扩展:
这个脚本每次执行都会从头到尾查询日志,如果日志大了,那么效率将随之降低,现在需要一个办法,实现增量查询。就是完成一次查询后,记录下当前日志行数(例如100行)那么下一次查询的时候,就从101行开始,前面的忽略掉。不过具体要怎么实现,还要再研究下,或者哪个高手指导下,刚学shell 表示没经验