A = LOAD'/user/input/t.txt' as (k:chararray,c:int);
B = group A BY k;
C = foreach Bgenerate group,SUM(A.c);
store C into'/user/output/test1.out';
DUMP C;
store C into'/user/output/test2.out';
A = LOAD'/user/input/t.txt' as (k:chararray,c:int);
B = group A BY k;
C = foreach Bgenerate group,SUM(A.c);
store C into'/user/output/test1.out';
store C into'/user/output/test2.out';
With multi-query execution Pig processes an entire script or a batch ofstatements at once.Will create a batch Job to process the data
Turning it On or Off
Multi-query execution is turned on by default. To turn it off and revertto Pig's "execute-on-dump/store" behavior, use the "-M" or"-no_multiquery" options.
To run script "myscript.pig" without the optimization, executePig as follows:
$ pig -M myscript.pig
or
$ pig -no_multiquerymyscript.pig
the first code willproduce three mapred Job for:
1.store C into'/user/output/test1.out'
2.DUMP C
3.store C into '/user/output/test2.out'
while the secondecode will only produce:one mapred Job
if we run the secondcode by: pig -no_multiquery test.pig it will also produce two Jobs
Store vs. Dump
With multi-query exection, you want to use STORE to save(persist) your results. You do not want to use DUMP as it will disable multi-queryexecution and is likely to slow down execution. (If you have included DUMPstatements in your scripts for debugging purposes, you should remove them.)
A = LOAD'/user/input/t.txt' as (k:chararray,c:int);
B = group A BY k;
C = foreach Bgenerate group,SUM(A.c);
store C into'/user/output/test1.out';
DUMP C;
store C into'/user/output/test2.out';
A = LOAD'/user/input/t.txt' as (k:chararray,c:int);
B = group A BY k;
C = foreach Bgenerate group,SUM(A.c);
store C into'/user/output/test1.out';
store C into'/user/output/test2.out';
With multi-query execution Pig processes an entire script or a batch ofstatements at once.Will create a batch Job to process the data
Turning it On or Off
Multi-query execution is turned on by default. To turn it off and revertto Pig's "execute-on-dump/store" behavior, use the "-M" or"-no_multiquery" options.
To run script "myscript.pig" without the optimization, executePig as follows:
$ pig -M myscript.pig
or
$ pig -no_multiquerymyscript.pig
the first code willproduce three mapred Job for:
1.store C into'/user/output/test1.out'
2.DUMP C
3.store C into '/user/output/test2.out'
while the secondecode will only produce:one mapred Job
if we run the secondcode by: pig -no_multiquery test.pig it will also produce two Jobs
Store vs. Dump
With multi-query exection, you want to use STORE to save(persist) your results. You do not want to use DUMP as it will disable multi-queryexecution and is likely to slow down execution. (If you have included DUMPstatements in your scripts for debugging purposes, you should remove them.)