Pig的一个小问题 (filter之后 没有数据)


写Pig的时候,如果需要对数据中某个字段进行 比较或者运算,尤其是使用Filter的时候。

该字段最好是 bytearray类型,而不要用 int 或者 chararray类型。否则得不到数据


--抽取话单文件中过程号为 attach的话单 统计
set job.name 'lte_kpi_user_analysis_s1mme_attatch_15min';

allData = load '/user/hdfs/data/200506/20050601/000000/0B43CED07C7438B4328F918C64DACE35-merge';


sel_fields = foreach allData generate  FLOOR($10/300)*300+8*3600 as  (time:chararray), $2 as (interface_type:chararray),$8 as (procedure_type:bytearray

,$7 as (msisdn:chararray),$6 as (imei:chararray),$5 as (imsi:chararray),$1 as (city:chararray),$21 as (mme_ip:chararray),$22 as (enodeb_ip:chararray)

,$25 as (tac:chararray),$26 as (cell_id:chararray),$12 as (cause:int),($10-$9) as  (delay:int),$11 as (procedure_status:bytearray);


--抽取过程号等于1 

s1mme_attach = filter sel_fields by procedure_type == 1;


group_by_user = group s1mme_attach by(time, interface_type, procedure_type, msisdn,imei, imsi, city,mme_ip,
   
   enodeb_ip, tac, cell_id, cause, hour,miniute15);
   

tmp_result = foreach group_by_user {
   
  failures = filter s1mme_attach by procedure_status>0;
   
   user_num = distinct s1mme_attach.imsi;
   
   generate FLATTEN(group), COUNT(failures) as count_fail, COUNT(user_num) as count_total, AVG(s1mme_attach.delay);
   
};

--测试输出

filter_tmp = limit s1mme_attach 10;

test_out = limit group_by_user 10;

test_out2 = limit tmp_result 10;

store  filter_tmp into  '/user/hdfs/xiatao/pigOutput/s1mme_attatch_15min_filter1';

store test_out into '/user/hdfs/xiatao/pigOutput/s1mme_attatch_15min_1';

store test_out2 into '/user/hdfs/xiatao/pigOutput/s1mme_attatch_15min_2';

如果不适用 bytearray类型,后面输出的几个文件夹中就没有数据。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值