Pig—MultiQuery Execution

 

A = LOAD'/user/input/t.txt' as (k:chararray,c:int);

B = group A BY k;

C = foreach Bgenerate group,SUM(A.c);

store C into'/user/output/test1.out';

DUMP C;

store C into'/user/output/test2.out';

A = LOAD'/user/input/t.txt' as (k:chararray,c:int);

B = group A BY k;

C = foreach Bgenerate group,SUM(A.c);

store C into'/user/output/test1.out';

store C into'/user/output/test2.out';

With multi-query execution Pig processes an entire script or a batch ofstatements at once.Will create a batch Job to process the data

Turning it On or Off

Multi-query execution is turned on by default. To turn it off and revertto Pig's "execute-on-dump/store" behavior, use the "-M" or"-no_multiquery" options.

To run script "myscript.pig" without the optimization, executePig as follows:

$ pig -M myscript.pig

or

$ pig -no_multiquerymyscript.pig

the first code willproduce three mapred Job for: 

1.store C into'/user/output/test1.out'

2.DUMP C

3.store C into '/user/output/test2.out' 

while the secondecode will only produce:one mapred Job

if we run the secondcode by: pig -no_multiquery test.pig it will also produce two Jobs

Store vs. Dump

With multi-query exection, you want to use STORE to save(persist) your results. You do not want to use DUMP as it will disable multi-queryexecution and is likely to slow down execution. (If you have included DUMPstatements in your scripts for debugging purposes, you should remove them.)

 

 

 

 
A = LOAD'/user/input/t.txt' as (k:chararray,c:int);

 

B = group A BY k;

C = foreach Bgenerate group,SUM(A.c);

store C into'/user/output/test1.out';

DUMP C;

store C into'/user/output/test2.out';

A = LOAD'/user/input/t.txt' as (k:chararray,c:int);

B = group A BY k;

C = foreach Bgenerate group,SUM(A.c);

store C into'/user/output/test1.out';

store C into'/user/output/test2.out';

With multi-query execution Pig processes an entire script or a batch ofstatements at once.Will create a batch Job to process the data

Turning it On or Off

Multi-query execution is turned on by default. To turn it off and revertto Pig's "execute-on-dump/store" behavior, use the "-M" or"-no_multiquery" options.

To run script "myscript.pig" without the optimization, executePig as follows:

$ pig -M myscript.pig

or

$ pig -no_multiquerymyscript.pig

the first code willproduce three mapred Job for: 

1.store C into'/user/output/test1.out'

2.DUMP C

3.store C into '/user/output/test2.out' 

while the secondecode will only produce:one mapred Job

if we run the secondcode by: pig -no_multiquery test.pig it will also produce two Jobs

Store vs. Dump

With multi-query exection, you want to use STORE to save(persist) your results. You do not want to use DUMP as it will disable multi-queryexecution and is likely to slow down execution. (If you have included DUMPstatements in your scripts for debugging purposes, you should remove them.)

 

 
 

 

 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值