--对event和clicks分别取出分组字段,整体属性字段包装起来。
events = foreach events generate opxpid, client_id, TOTUPLE(*) as actual;
clicks = foreach clicks generate opxpid, client_id, TOTUPLE(*) as actual;
--合并
cstream = union events, clicks;
--分组
grpd = group cstream by (opxpid, client_id) parallel 18;
--取出分组后的数据流
strmi = foreach grpd generate FLATTEN(cstream.actual);
strmi = foreach strmi generate FLATTEN(actual);
pig将多对象按相同属性集合分组
最新推荐文章于 2019-07-19 10:30:06 发布