pig 安装

pig-0.12.0-cdh5.2.0


路径:/opt/dev/pig/pig-0.12.0-cdh5.2.0
启动:pig
停止:quit;
环境变量
    export PIG_HOME=/opt/dev/pig/pig-0.12.0-cdh5.2.0
    export PATH=$PATH:$PIG_HOME/bin
    export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop
    source /etc/profile
启动
    pig –x local
    pig –x mapreduce



使用:

1.本地(local)模式:可以读取本地文件
2.mapreduce模式:只能读取hdfs上的文件
3.语法

(1)load加载数据
    load 'filePath' [using PigStor age(',')] [as (data  structure )];
    records = load '/opt/dev/pig/temp/t.txt' as (year: chararray,temperature: int);
    records = load 'hdfs://wanggang:9000/bonc/student.txt' using PigStorage(',') as(classNo:chararray,studNo:chararray, score:int);
(2)store存储数据
    store data into 'filePath' [using PigStorage(':')];
    store records into ' hdfs://localhost:9000/bonc/student_out' using PigStorage(':');
(3)filter过滤数据(筛选)
    filter data by conditions
    records_c01 = filter records by classNo=='C01';
(4)group分组数据
    group data by field [parallel 2]
    parallel  2 表示启用2个mapReduce
    grouped_records = group records by classNo parallel 2;
    grouped_records = group valid_records by year;
(5)foreach遍历数据
    Foreach对关系中的每一个记录循环,然后按指定模式生成一个新的关系。
    foreach data generate structure
    score_c01 = foreach records_c01 generate 'Teacher',$1,score;
    max_temperature = foreach grouped_records generate group,MAX(valid_records.temperature);
(6)join连接数据
    join data by field,data by field;
    r_joined = join r_student by classNo,r_teacher by classNo;
(7)cross连接数据(类似于笛卡尔)
    cross data data;
    r = cross r_student,r_teacher;
(8)order排序数据
    order data by field desc(asc),...;
    r = order r_student by score desc, classNo asc;
(9)union联合数据
    union data data;
    r_union = union r_student, r_teacher;
(10)dump输出数据
    dump data
(11)describe查看数据结构
    describe data
4.案例:
文件:
1990 21
1990 18
1991 21
1992 30
1992 999
1990 23
脚本:
records = load ‘/opt/dev/pig/temp/t.txt’ as (year: chararray,temperature: int);
dump records;
describe records;
valid_records = filter records by temperature!=999;
grouped_records = group valid_records by year;
dump grouped_records;
describe grouped_records;
max_temperature = foreach grouped_records generate group,MAX(valid_records.temperature);
dump max_temperature;

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值