对文件中的以下网址排序,统计出出现次数排名top3的网址。
文件名:
- http://www.google.com
- http://www.baidu.com
- http://www.sina.com
- http://www.bjtu.edu.cn
- http://www.codeproject.com
- http://www.csdn.com
- http://www.sohu.com
- http://www.yahoo.com
- http://mail.163.com
- http://www.bjtu.edu.cn
- http://www.codeproject.com
- http://www.csdn.com
- http://www.sohu.com
- http://www.yahoo.com
- http://mail.163.com
- http://www.codeproject.com
- http://www.csdn.com
- http://www.sohu.com
- http://www.yahoo.com
- http://mail.163.com
- http://www.qq.com
- http://www.hao123.com
- http://www.163.com
- http://youku.com
- http://taobao/com
- http://www.bjtu.edu.cn
- http://www.codeproject.com
- http://www.csdn.com
- http://www.sohu.com
- http://www.yahoo.com
- http://mail.163.com
- http://www.codeproject.com
- http://www.csdn.com
- http://www.sohu.com
- http://www.yahoo.com
- http://mail.163.com
- http://www.qq.com
- http://www.hao123.com
- http://www.163.com
- http://youku.com
- http://taobao/com
shell脚本如下:
#!/bin/bash
foo()
{
if [ $# -ne 1 ];
then
exit -1
fi
filename=$1
egrep -o "http://[a-zA-Z0-9.]+\.[a-zA-Z]{2,3}" $filename | \
awk '{ count[$0]++ }
END{
printf("%-30s%s\n","winsit","count");
for(ind in count)
{
printf("%-30s%d\n",ind,count[ind]);
}
}'
}
foo web.txt | sort -nrk 2 | head -3 >websort2.txt #sort命令 -n:【纯数字】类型 -r :倒序 -k:指定对哪个Filed(字段)进行排序
输出websort2.txt内容如下:
http://www.yahoo.com 5
http://www.sohu.com 5
http://www.csdn.com 5