pig flatten

今天通过不断的尝试,终于知道这个flatten的用法了。其实吧,有时候关键是要test,才能充分理解解说。不过,同事给说的有点问题,误导了我。整的我一直没明白怎么回事。

这是官方的解释:

The FLATTEN operator looks like a UDF syntactically, but it is actually an operator that changes the structure of tuples and bags in a way that a UDF cannot. Flatten un-nests tuples as well as bags. The idea is the same, but the operation and result is different for each type of structure.

For tuples, flatten substitutes the fields of a tuple in place of the tuple. For example, consider a relation that has a tuple of the form (a, (b, c)). The expression GENERATE $0, flatten($1), will cause that tuple to become (a, b, c).

For bags, the situation becomes more complicated. When we un-nest a bag, we create new tuples. If we have a relation that is made up of tuples of the form ({(b,c),(d,e)}) and we apply GENERATE flatten($0), we end up with two tuples (b,c) and (d,e). When we remove a level of nesting in a bag, sometimes we cause a cross product to happen. For example, consider a relation that has a tuple of the form (a, {(b,c), (d,e)}), commonly produced by the GROUP operator. If we apply the expression GENERATE $0, flatten($1) to this tuple, we will create new tuples: (a, b, c) and (a, d, e).

我试验下来也是这样的,我今天把第一种和第二种情况都尝试了,实验证明,即使是第二种,其实一次flatten就够了,就得到schema了。这样的数据,

Joe {(Joe,18,3.8)}
Bill {(Bill,20,3.9)}
John {(John,18,4.0)}
Mary {(Mary,19,3.8),(Mary,19,5.0)}

a = load 'result' as (f1:chararray,B: bag {T: tuple(t1:chararray, t2:int, t3:float)});

b = foreach a GENERATE FLATTEN(B) as (t1:chararray,t2:int,t3:float);

这个是可以一次性flatten的。但是更高的复杂度我每测试,应该是需要两次这种操作的吧。真是真是对bag, tuple也长了见识了。明天看看能否把数据传输到UDF中操作。


总结一句话,在不确定时要首先看官方文档,然后就先拿小数据测试一下,看看每一步得到的是什么结构describe,同时store后看看是什么结果,是否和自己想的一样。整体来说还是很清晰的。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值