先来个文件大小
[root@localhost dnslog]# du -sh namedgc.log.20140829
286M namedgc.log.20140829
算一算行数,300多万行;
[root@localhost dnslog]# wc -l namedgc.log.20140829
3052931 namedgc.log.20140829
请求的日志形如:
29-Aug-2014 18:18:18.303 client 10.28.2.254#55474: query: clients3.google.com IN A + (10.28.5.101)
大约请求了290W个地址:
[root@localhost dnslog]# grep query namedgc.log.20140829 |wc -l
2890487
共计7W多不重复的URL.
[root@localhost dnslog]# grep query namedgc.log.20140829 |awk '{print $6}' | sort | uniq |wc -l
71384
每天的URL请求量,大概为50W:
[root@localhost dnslog]# grep "29-Aug-2014" namedgc.log.20140829 |wc -l
525048
[root@localhost dnslog]# grep "28-Aug-2014" namedgc.log.20140829 |wc -l
553608
平均下来每台电脑接近7千次DNS请求;假设每个网页有20个超链接,一台电脑一天平均浏览350个网页;
分析.com解释的地址形如:xxx.com,独立主站有接近1W个:
[root@localhost dnslog]# awk -F ".com" '{print $1}' /tmp/com_uniq_url.list | sed 's/\./ /g'| awk '{print $NF".com"}' > /tmp/com_main_web.com
[root@localhost dnslog]# sort /tmp/com_main_web.com | uniq |wc -l
9841
夜深了,准备洗漱,睡觉,明天,有时间的话,继续。。