使用Pig计算出每个ip的点击次数

日志文件格式如下:
220.181.108.151 - - [31/Jan/2012:00:02:32 +0800] "GET /home.php?mod=space&uid=158&do=album&view=me&from=space HTTP/1.1" 200 8784 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
208.115.113.82 - - [31/Jan/2012:00:07:54 +0800] "GET /robots.txt HTTP/1.1" 200 582 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"
220.181.94.221 - - [31/Jan/2012:00:09:24 +0800] "GET /home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_3&touid=3&pmid=0&daterange=2&pid=398&tid=66 HTTP/1.1" 200 10070 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_common.css?AZH HTTP/1.1" 200 57752 "http://f.dataguru.cn/forum-58-1.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"
一、Pig下载:
下载地址:http://www.apache.org/dyn/closer.cgi/pig

二、Pig安装:
解压
[grid@hadoop1 ~]$ tar -zxf pig-0.14.0.tar.gz

设置环境变量
[grid@hadoop1 ~]$ vi .bash_profile
PIG_INSTALL=/home/grid/pig-0.14.0
PIG_CLASSPATH=/home/grid/hadoop-1.2.1/conf/
PATH=$PATH:$PIG_INSTALL/bin
export PIG_INSTALL PATH PIG_CLASSPATH

设置JAVA_HOME
修改hosts文件

验证
[grid@hadoop1 ~]$ pig -help

连接到Hadoop集群
[grid@hadoop1 ~]$ pig
grunt> ls
hdfs://hadoop1:9000/user/grid/in    <dir>
hdfs://hadoop1:9000/user/grid/out    <dir>

三、开始作业
加载数据
grunt> A = LOAD 'in/8/access_log.txt' USING PigStorage (' ') AS ( ip, page);
grunt> DESCRIBE A;
A: {ip: bytearray,page: bytearray}
去掉用不着的信息
grunt> B = FOREACH A GENERATE ip;
分组
grunt> C = GROUP B BY ip;
grunt> DESCRIBE C;
C: {group: bytearray,B: {(ip: bytearray)}}
统计
grunt> D = FOREACH C GENERATE group AS ip, COUNT(B) AS count;
查看结果
grunt> DUMP D;
(127.0.0.1,2)
(1.59.65.67,2)
(112.4.2.19,9)
(112.4.2.51,80)
(60.2.99.33,42)
(69.28.58.5,1)
(69.28.58.6,9)
(69.28.58.8,5)
(1.193.3.227,3)
(1.202.221.3,6)
(117.136.9.4,6)
(121.31.62.3,26)
(182.204.8.4,59)
(183.9.112.2,25)
(221.12.37.6,25)
(223.4.16.88,2)
(27.9.110.75,122)



转载于:https://my.oschina.net/zc741520/blog/376475

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值