Pig基本数据结构
Pig Latin基本命令:
1) sh cmd:执行linux命令
2) grunt>records = load‘hdfs://localhost:9000/input/test’ as (value:int,age:int,apliy:chararray);
加载HDFS中的源文件,并存储为records其中value,age,apliy均为其Field,一行记录称为Tuple,records=Bag
3) grunt>dump records;查看records表
4) grunt>describe records:查看records表结构
5) grunt>filtered_records = FILTER recordsBY temperature != 9999 AND (quality==0);
FILTER命令类似于SQL语句中的where条件判断,只保留符合条件的记录
6) grunt>grouped_records = GROUPfiltered_records BY year;通过year属性进行分组,得到的结果如下:
7) grunt>max_temp = FOREACH grouped_recordsGENERATE group,MAX(filtered_records.temperature)FOREACH逐行扫描只保留每个分组中temperature的最大的记录
(1949,111)
(1950,22)
8)grunt> store max_temp into'temp' using PigStorage(':');把max_temp的结果保存在temp文件中,并且每个Filed用:分割