Pig 分析训练

Destiny_-Sky

已于 2022-02-25 17:36:10 修改

阅读量821

点赞数

分类专栏： Pig 文章标签： hdfs hadoop big data

于 2022-02-25 17:27:06 首次发布

本文链接：https://blog.csdn.net/weixin_54051652/article/details/123137441

版权

Pig 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

一、准备数据

在HDFS文件系统创建一个input目录，并从本地上传任意一个文件到目录中，为后面的Pig单词统计准备数据：

hdfs dfs -mkdir /input
hdfs dfs -ls /

我们就任意分析一个数据，比如Hadoop下的README.txt文件，将它上传至HDFS

cd /usr/cstor/hadoop


cat README.txt

hdfs dfs -put README.txt /input

hdfs dfs -ls /input

二、启动 Grunt shell

cd /root/soft/pig
pig -x local

读取并转换数据

a = load 'hdfs://master:8020/pig/README.txt' as (line: chararray) ;   按行读取HDFS文件

b = foreach a generate flatten(TOKENIZE(line,'\t ,.'))as word;        将每行单词用tab、逗号，句号以及空格分隔单词

dump b ；

 c = group b by word ;       按单词分组，将相同的单词归并到一起
dump c ;

 d = foreach c generate group, COUNT(b)  as count ;     统计每个单词的个数
dump d ;

e = order d by count desc ;   将统计出的次数来排序（降序）
dump e ;

ok，分析完成，出现最多的是the 出现了8次。

Destiny_-Sky

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Pig 分析训练

一、准备数据在HDFS文件系统创建一个input目录，并从本地上传任意一个文件到目录中，为后面的Pig单词统计准备数据：hdfs dfs -mkdir /inputhdfs dfs -ls /我们就任意分析一个数据，比如Hadoop下的README.txt文件，将它上传至HDFScd /usr/cstor/hadoopcat README.txthdfs dfs -put README.txt /inputhdfs dfs -ls /input二、启动 Gr.
复制链接

扫一扫