2021SC@SDUSC
概述
本次继续分析pig作为hadoop的轻量级脚本语言操作hadoop的executionengine包下的LimitAdjuster类的代码
LimitAdjuster类
limit adjuster:限位调节器
visitMROp方法
查找包含限制运算符的 map reduce 运算符。
如果查找到,使用一个reducer来添加一个额外的map-reduce操作进入源计划。
public void visitMROp(MapReduceOper mr) throws VisitorException {
if ((mr.limit!=-1 || mr.limitPlan!=null) )
{
opsToAdjust.add(mr);
}
}
splitReducerForLimit方法
在reducer计划中,从firstMROp到secondMROp,移动 POLimit 和 POStore 之间所有的operators。
private void splitReducerForLimit(MapReduceOper secondMROp,
MapReduceOper firstMROp) throws PlanException, VisitorException {
PhysicalOperator op = firstMROp.reducePlan.getRoots().get(0);
assert(op instanceof POPackage);
while (true) {
List<PhysicalOperator> succs = firstMROp.reducePlan
.getSuccessors(op);
if (succs==null) break;
op = succs.get(0);
if (op instanceof POLimit) {
op = firstMROp.reducePlan.getSuccessors(op).get(0);
break;
}
}
}
adjust方法
将原来的减少计划拆分为两个mapreduce作业:
第一:从根(POPackage)到POLimit
第二:从POLimit到叶(POPackage),重复POLimit
public void adjust() throws IOException, PlanException{
FileSpec fSpec = new FileSpec(FileLocalizer.getTemporaryPath(pigContext).toString(),
new FuncSpec(Utils.getTmpFileCompressorName(pigContext)));
POStore storeOp = (POStore) mpLeaf;
storeOp.setSFile(fSpec);
storeOp.setIsTmpStore(true);
mr.setReduceDone(true);
MapReduceOper limitAdjustMROp = new MapReduceOper(new OperatorKey(scope,nig.getNextNodeId(scope)));
POLoad ld = new POLoad(new OperatorKey(scope,nig.getNextNodeId(scope)));
ld.setPc(pigContext);
ld.setLFile(fSpec);
ld.setIsTmpLoad(true);
limitAdjustMROp.mapPlan.add(ld);
if (mr.isGlobalSort()) {
connectMapToReduceLimitedSort(limitAdjustMROp, mr);
} else {
MRUtil.simpleConnectMapToReduce(limitAdjustMROp, scope, nig);
}
}
splitReducerForLimit(limitAdjustMROp, mr);
if (mr.isGlobalSort())
{
limitAdjustMROp.setLimitAfterSort(true);
limitAdjustMROp.setSortOrder(mr.getSortOrder());
}