flatten

最新推荐文章于 2024-05-26 16:56:55 发布

caoeryingzi

最新推荐文章于 2024-05-26 16:56:55 发布

阅读量7.9k

点赞数 2

分类专栏： hadoop pig 文章标签： schema

本文链接：https://blog.csdn.net/caoeryingzi/article/details/7968257

版权

hadoop 同时被 2 个专栏收录

27 篇文章 1 订阅

订阅专栏

pig

4 篇文章 0 订阅

订阅专栏

今天通过不断的尝试，终于知道这个flatten的用法了。其实吧，有时候关键是要test，才能充分理解解说。不过，同事给说的有点问题，误导了我。整的我一直没明白怎么回事。

这是官方的解释：

The FLATTEN operator looks like a UDF syntactically, but it is actually an operator that changes the structure of tuples and bags in a way that a UDF cannot. Flatten un-nests tuples as well as bags. The idea is the same, but the operation and result is different for each type of structure.

For tuples, flatten substitutes the fields of a tuple in place of the tuple. For example, consider a relation that has a tuple of the form (a, (b, c)). The expression GENERATE $0, flatten($1), will cause that tuple to become (a, b, c).

For bags, the situation becomes more complicated. When we un-nest a bag, we create new tuples. If we have a relation that is made up of tuples of the form ({(b,c),(d,e)}) and we apply GENERATE flatten($0), we end up with two tuples (b,c) and (d,e). When we remove a level of nesting in a bag, sometimes we cause a cross product to happen. For example, consider a relation that has a tuple of the form (a, {(b,c), (d,e)}), commonly produced by the GROUP operator. If we apply the expression GENERATE $0, flatten($1) to this tuple, we will create new tuples: (a, b, c) and (a, d, e).

我试验下来也是这样的，我今天把第一种和第二种情况都尝试了，实验证明，即使是第二种，其实一次flatten就够了，就得到schema了。这样的数据，

Joe {(Joe,18,3.8)}
Bill {(Bill,20,3.9)}
John {(John,18,4.0)}
Mary {(Mary,19,3.8),(Mary,19,5.0)}

a = load 'result' as (f1:chararray,B: bag {T: tuple(t1:chararray, t2:int, t3:float)});

b = foreach a GENERATE FLATTEN(B) as (t1:chararray,t2:int,t3:float);

这个是可以一次性flatten的。但是更高的复杂度我每测试，应该是需要两次这种操作的吧。真是真是对bag, tuple也长了见识了。明天看看能否把数据传输到UDF中操作。

总结一句话，在不确定时要首先看官方文档，然后就先拿小数据测试一下，看看每一步得到的是什么结构describe,同时store后看看是什么结果，是否和自己想的一样。整体来说还是很清晰的。

caoeryingzi

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
flatten

今天通过不断的尝试，终于知道这个flatten的用法了。其实吧，有时候关键是要test，才能充分理解解说。不过，同事给说的有点问题，误导了我。整的我一直没明白怎么回事。这是官方的解释：The FLATTEN operator looks like a UDF syntactically, but it is actually an operator that changes the
复制链接

扫一扫