大数据系列之运维(自主搭建的大数据平台)
(5)Pig运维
- 在 master 节点安装 Pig Clients,打开 Linux Shell 以 MapReduce 模式启动它的 Grunt。
[root@master ~]# pig
- 在 master 节点安装 Pig Clients,打开 Linux Shell 以 Local 模式启动它的Grunt。
[root@master ~]# pig -x local
- 使用 Pig 工具在 Local 模式计算系统日志 access-log.txt 中的 IP 的点击数,要求使用 GROUP BY 语句按照 IP 进行分组,通过 FOREACH 运算符,对关系的列进行迭代,统计每个分组的总行数,最后使用 DUMP 语句查询统计结果。
[root@master ~]# pig -x local
grunt> copyFromLocal /root/tiku/Pig/access-log.txt /user/root/access-log.txt
grunt> A = LOAD '/user/root/access-log.txt' USING PigStorage('\t') AS (ip,others);
grunt> group_ip = group A by ip;
grunt> result = foreach group_ip generate group,COUNT(A);
grunt> dump result;
(59.61.216.4 - - [11/May/2016:14:33:10 -0400] "GET /assets/fonts/fontawesome-webfont.woff?v=3.2.1 HTTP/1.1" 200 43572 "http://gs.1daoyun.com/assets/stylesheets/light-theme.css" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0" "-",1)
(61.160.71.250 - - [11/May/2016:16:03:25 -0400] "GET /lms/myexam.action HTTP/1.1" 200 10762 "http://gs.1daoyun.com/lms/competionlist.action" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.87 Safari/537.36" "-",1)
copyFromLocal /root/tiku/Pig/access-log.txt /user/root/access-log.txt中
/root/tiku/Pig/access-log.txt是我的练习题库的存放路径,
/user/root/是pig的日志存放路径,你在运行pig是可以查看到你的日志存储路径
这里的/root/ 指的是file的用户。为root。因为我们使用本地模式开启pig。
用MapReduce模式开启pig就是在HDFS的用户了 。
LOAD ‘/user/root/access-log.txt’ USING PigStorage(’\t’) AS (ip,others);中
//PigStorage(’\t’)文本里是哪种方式分隔的就用同样的分隔方式。
- 使用 Pig 工具计算天气数据集 temperature.txt 中年度最高气温,要求使用GROUP BY 语句按照 year 进行分组,通过 FOREACH 运算符,对关系的列进行迭代,统计每个分组的最大值,最后使用 DUMP 语句查询计算结果。
grunt> copyFromLocal /root/tiku/Pig/temperature.txt /user/root/temperature.txt
grunt> A = LOAD '/user/root/temperature.txt' USING PigStorage(' ')AS(year:int,temp:int);
grunt> group_year = group A by year;
grunt> result = foreach group_year generate group,MAX(A.temp);
grunt> dump result;
(1990,38)
(1991,38)
(1992,38)
(1993,39)
(1994,39)
(1995,39)
(1996,40)
(1997,33)
(1998,37)
(1999,38)
(2000,38)
(2001,40)
(2002,38)
(2003,40)
(2004,40)
(2005,38)
(2006,34)
(2007,39)
(2008,38)
(2009,39)
(2010,39)
(2011,36)
(2012,40)
(2013,36)
(2014,37)
(2015,39)
- 使用 Pig 工具统计数据集 ip_to_country 中每个国家的 IP 地址数。要求使用GROUP BY 语句按照国家进行分组,通过 FOREACH 运算符,对关系的列进行迭代,统计每个分组的 IP 地址数目,最后将统计结果保存到/data/pig/output 目录中,并查看数据结果。
grunt> copyFromLocal /root/tiku/Pig/ip_to_country.txt /user/root/ip_to_country.txt
grunt> A = LOAD '/user/root/ip_to_country.txt' USING PigStorage('\t') AS (ip:chararray,country:chararray);
grunt> group_country = group A by country;
grunt> result = foreach group_country generate flatten(group),COUNT(A) as counts;
grunt> store result into '/data/pig/output';
grunt> dump result;
store result into ‘/data/pig/output’;中的结果路径,
可以不用在HDFS上先创建。得保证这个路径在HDFS上不存在。否则会报错路径已存在。
(Iraq,1)
(Oman,1)
(Peru,3)
(Chile,7)
(China,252)
(Egypt,6)
(Gabon,1)
(India,30)
(Italy,43)
(Japan,177)
(Macau,1)
(Nepal,1)
(Qatar,1)
(Spain,21)
(Yemen,2)
(Angola,2)
(Brazil,38)
(Canada,75)
(Europe,34)
(France,58)
(Greece,6)
(Israel,6)
(Kuwait,5)
(Latvia,1)
(Mexico,23)
(Norway,18)
(Poland,15)
(Serbia,1)
(Sweden,17)
(Taiwan,26)
(Turkey,16)
(Albania,1)
(Algeria,2)
(Austria,14)
(Bahrain,1)
(Belarus,1)
(Belgium,14)
(Croatia,2)
(Denmark,11)
(Ecuador,3)
(Estonia,2)
(Finland,13)
(Germany,89)
(Hungary,2)
(Iceland,1)
(Ireland,5)
(Morocco,19)
(Nigeria,1)
(Romania,13)
(Senegal,1)
(Tunisia,3)
(Ukraine,10)
(Uruguay,2)
(Vietnam,13)
(Barbados,1)
(Botswana,1)
(Bulgaria,6)
(Colombia,21)
(Malaysia,8)
(Pakistan,4)
(Portugal,3)
(Slovenia,2)
(Thailand,10)
(Argentina,13)
(Australia,68)
(Guatemala,1)
(Hong Kong,8)
(Indonesia,29)
(Lithuania,6)
(Macedonia,1)
(Mauritius,10)
(Singapore,5)
(Venezuela,4)
(Azerbaijan,1)
(Costa Rica,2)
(Kazakhstan,3)
(Martinique,1)
(Uzbekistan,1)
(Netherlands,28)
(New Zealand,9)
(Philippines,7)
(Switzerland,15)
(Saudi Arabia,4)
(South Africa,20)
(United States,1379)
(Czech Republic,7)
(United Kingdom,93)
(Anonymous Proxy,1)
(Dominican Republic,1)
(Korea, Republic of,70)
(Russian Federation,36)
(Satellite Provider,2)
(Moldova, Republic of,1)
(Syrian Arab Republic,1)
(United Arab Emirates,2)
(Bosnia and Herzegovina,1)
(Iran, Islamic Republic of,2)
(Tanzania, United Republic of,1)
运行pig相关作业:报错
2020-04-02 13:02:59,410 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-02 13:02:59,521 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
可能是由于没有开启historyserver服务。
解决方法:
[root@master sbin]# ./mr-jobhistory-daemon.sh start historyserver
开启historyserver服务。
在此感谢先电云提供的题库。
感谢Apache开源技术服务支持
感谢抛物线、mn525520、菜鸟一枚2019三位博主的相关博客。