Pig: Introduction to Latin - 4

  • cogroup

cogroup is a generalization of group . Instead of collecting records of one input based on a key, it collects records of n inputs based on a key. The result is a record with a key and one bag for each input.

 

A = load 'input1' as (id:int, val:float);
B = load 'input2' as (id:int, val2:int);
C = cogroup A by id, B by id;
describe C;
C: {group: int,A: {id: int,val: float},B: {id: int,val2: int}}

 

Another way to think of cogroup is as the first half of a join. The keys are collected together, but the cross product is not done.

 

  • union

union puts two data sets together by concatenating them.

 

A = load 'input1' as (x:int, y:float);
B = load 'input2' as (x:int, y:float);
C = union A, B;
describe C;
C: {x: int,y: float}

 

A = load 'input1' as (w:chararray, x:int, y:float);
B = load 'input2' as (x:int, y:double, z:chararray);
C = union onschema A, B;
describe C;
C: {w: chararray,x: int,y: double,z: chararray}

 

  • cross

cross matches the mathematical set operation of the same name. In the following Pig Latin, cross takes every record in NYSE_daily and combines it with every record in NYSE_dividends:

 

daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,date:chararray,

                                              open:float,high:float,  low:float,close:float, volume:int, adj_close:float);
divs= load 'NYSE_dividends' as (exchange:chararray, symbol:chararray,date:chararray, dividends:float);
tonsodata = cross daily, divs parallel 10;

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值