大数据运维Pig

Pig 题:

  1. 在 master 节点安装 Pig Clients,打开 Linux Shell 以 MapReduce 模式启动它的 Grunt。
    [root@master ~]# pig -x mapreduce
    19/05/13 12:22:05 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
    19/05/13 12:22:05 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
    19/05/13 12:22:05 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
    2019-05-13 12:22:05,369 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0.2.6.1.0-129 (rexported) compiled May 31 2017, 03:39:20
    2019-05-13 12:22:05,369 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1557750125367.log
    2019-05-13 12:22:05,391 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
    2019-05-13 12:22:06,024 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.hadoop:8020
    2019-05-13 12:22:07,205 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-538791d0-3538-4bbe-8b81-65255cc47648
    2019-05-13 12:22:07,595 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http:// slaver1.hadoop:8188/ws/v1/timeline/
    2019-05-13 12:22:07,712 [main] INFO org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook
    grunt>

  2. 在 master 节点安装 Pig Clients,打开 Linux Shell 以 Local 模式启动它的Grunt。
    [root@master ~]#pig -x local
    19/05/13 12:29:21 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
    19/05/13 12:29:21 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType
    2019-05-13 12:29:21,050 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0.2.6.1.0-129 (rexported) compiled May 31 2017, 03:39:20
    2019-05-13 12:29:21,050 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1557750561048.log
    2019-05-13 12:29:21,081 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
    2019-05-13 12:29:21,325 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
    2019-05-13 12:29:21,543 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-0778a22e-1545-4b39-a22b-dc7b1dbaf600
    2019-05-13 12:29:21,543 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
    grunt>

  3. 使用 Pig 工具在 Local 模式计算系统日志 access-log.txt 中的 IP 的点击数,要求使用 GROUP BY 语句按照 IP 进行分组,通过 FOREACH 运算符,对关系的列进行迭代,统计每个分组的总行数,最后使用 DUMP 语句查询统计结果。
    [root@master ~]#pig
    19/05/13 18:36:09 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
    19/05/13 18:36:09 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
    19/05/13 18:36:09 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL
    19/05/13 18:36:09 INFO pig.ExecTypeProvider: Trying ExecType : TEZ
    19/05/13 18:36:09 INFO pig.ExecTypeProvider: Picked TEZ as the ExecType
    2019-05-13 18:36:09,533 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0.2.6.1.0-129 (rexported) compiled May 31 2017, 03:39:20
    2019-05-13 18:36:09,533 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1557772569524.log
    2019-05-13 18:36:09,590 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
    2019-05-13 18:36:10,298 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.hadoop:8020
    2019-05-13 18:36:11,042 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-cab8cfb8-c55d-4fb3-8dc6-d3c7e7ba1919
    2019-05-13 18:36:11,425 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http:// slaver1.hadoop:8188/ws/v1/timeline/
    2019-05-13 18:36:11,533 [main] INFO org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook
    grunt> copyFromLocal /opt/access.log /user/root/input/
    grunt> A = load '/user/root/log1.txt' using PigStorage('\t') AS (ip,others);
    grunt> group_ip = group A by ip;
    grunt> result = foreach group_ip generate group,COUNT(A);
    grunt>dump result;
    (198.108.67.48 - - [12/Jan/2019:23:30:31 +0800] “” 400 0 “-” “-”,1)
    (198.108.67.48 - - [13/Feb/2019:13:03:14 +0800] “” 400 0 “-” “-”,1)
    (198.108.67.48 - - [23/Jan/2019:17:22:36 +0800] “” 400 0 “-” “-”,1)
    (198.108.67.48 - - [29/Jan/2019:18:30:56 +0800] “” 400 0 “-” “-”,1)
    (18.225.36.136 - - [14/Feb/2019:08:59:02 +0800] “\x05\x01\x00” 400 166 “-” “-”,1)
    (183.57.54.43 - - [19/Jan/2019:14:04:06 +0800] “GET / HTTP/1.0” 403 162 “-” “-”,1)
    (183.57.54.43 - - [21/Feb/2019:13:54:37 +0800] “GET / HTTP/1.0” 403 162 “-” “-”,1)
    (183.57.54.43 - - [24/Dec/2018:06:49:19 +0800] “GET / HTTP/1.0” 403 162 “-” “-”,1)

  4. 使用 Pig 工具计算天气数据集 temperature.txt 中年度最高气温,要求使用GROUP BY 语句按照 year 进行分组,通过 FOREACH 运算符,对关系的列进行迭代,统计每个分组的最大值,最后使用 DUMP 语句查询计算结果。
    [root@master ~]#pig
    19/05/15 13:37:23 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
    19/05/15 13:37:23 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
    19/05/15 13:37:23 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL
    19/05/15 13:37:23 INFO pig.ExecTypeProvider: Trying ExecType : TEZ
    19/05/15 13:37:23 INFO pig.ExecTypeProvider: Picked TEZ as the ExecType
    2019-05-15 13:37:24,074 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0.2.6.1.0-129 (rexported) compiled May 31 2017, 03:39:20
    2019-05-15 13:37:24,074 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1557927444065.log
    2019-05-15 13:37:24,113 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
    2019-05-15 13:37:24,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.hadoop:8020
    2019-05-15 13:37:25,666 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-3ee74d64-e4d3-4b87-9f27-0fd239e3274b
    2019-05-15 13:37:26,097 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http:// slaver1.hadoop:8188/ws/v1/timeline/
    2019-05-15 13:37:26,215 [main] INFO org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook
    grunt> copyFromLocal /opt/temperature.txt /user/root/temp.txt
    grunt>A = load '/user/root/temp.txt' using PigStorage('\t') as (year:int,temperature:int);
    grunt>B = group A by year;
    grunt> C = foreach B generate group,MAX(A.temperature);
    grunt>dump C;
    (2001,24)
    (2002,24)
    (2003,24)
    (2004,24)
    (2005,24)
    (2006,24)
    (2007,24)
    (2008,24)
    (2009,24)
    (2010,24)
    (2011,24)

  5. 使用 Pig 工具统计数据集 ip_to_country 中每个国家的 IP 地址数。要求使用GROUP BY 语句按照国家进行分组,通过 FOREACH 运算符,对关系的列进行迭代,统计每个分组的 IP 地址数目,最后将统计结果保存到/data/pig/output 目录中,并查看数据结果。
    [root@master ~]# pig
    19/05/15 19:00:42 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
    19/05/15 19:00:42 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
    19/05/15 19:00:42 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL
    19/05/15 19:00:42 INFO pig.ExecTypeProvider: Trying ExecType : TEZ
    19/05/15 19:00:42 INFO pig.ExecTypeProvider: Picked TEZ as the ExecType
    2019-05-15 19:00:42,203 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0.2.6.1.0-129 (rexported) compiled May 31 2017, 03:39:20
    2019-05-15 19:00:42,203 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1557946842194.log
    2019-05-15 19:00:42,245 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
    2019-05-15 19:00:42,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.hadoop:8020
    2019-05-15 19:00:43,672 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-6f9afd1b-b21d-4726-ab79-98e81fc813e3
    2019-05-15 19:00:44,043 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http:// slaver1.hadoop:8188/ws/v1/timeline/
    2019-05-15 19:00:44,153 [main] INFO org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook
    grunt> copyFromLocal /opt/ip_to_country.txt /user/root/ip_to_country.txt
    grunt> ip_country = load '/user/root/ip_to_country.txt' as (ip:chararray,country:chararray);
    grunt> country_grpd = group ip_country by country;
    grunt>country_counts = foreach country_grpd generate flatten(group),COUNT(ip_country) as counts;
    grunt>store country_counts into '/data/pig/output'; Vertex Stats:
    VertexId Parallelism TotalTasks InputRecords ReduceInputRecords OutputRecords FileBytesRead FileBytesWritten HdfsBytesRead HdfsBytesWritten Alias Feature Outputs
    scope-19 1 1 248284 0 248284 32 1935 4171199 0 country_counts,country_grpd,ip_country
    scope-20 1 1 0 246 246 1935 0 0 1618 country_counts GROUP_BY /data/pig/output,

Input(s):
Successfully read 248284 records (4171199 bytes) from: “/user/root/ip_to_country.txt”

Output(s):
Successfully stored 246 records (1618 bytes) in: “/data/pig/output”

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

mn525520

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值