山东大学软件工程应用与实践——Pig代码分析（十二）

最新推荐文章于 2021-12-23 19:09:00 发布

Tdqiu

最新推荐文章于 2021-12-23 19:09:00 发布

阅读量1k

点赞数

文章标签： pig

本文链接：https://blog.csdn.net/weixin_54263893/article/details/122114414

版权

2021SC@SDUSC

概述
本次继续分析pig作为hadoop的轻量级脚本语言操作hadoop的executionengine包下的MultiQueryOptimizer类的代码。
该类继承自MROpPlanVisitor，用于创建一个优化器，它将全部或部分拆分器MapReduceOpers合并到拆分器MapReduceOper中。此优化器通过将 POLoad/POStore 组合替换为 POSplit 运算符来合并这些MapReduceOpers。

isDiamondMROper方法

当这个 MR 是一个无关的 MR 时，才会删除此 MR 作为菱形查询优化的一部分，也就是说，它的计划有两个运算符（加载后跟存储）或三个运算符（加载和存储之间的运算符必须是 foreach，由强制转换操作引入）。

private boolean isDiamondMROper(MapReduceOper mr) {
        boolean rtn = false;
        if (isMapOnly(mr)) {
            PhysicalPlan pl = mr.mapPlan;
            if (pl.size() == 2 || pl.size() == 3) {
                PhysicalOperator root = pl.getRoots().get(0);
                PhysicalOperator leaf = pl.getLeaves().get(0);
                if (root instanceof POLoad && leaf instanceof POStore) {
                    if (pl.size() == 3) {
                        PhysicalOperator mid = pl.getSuccessors(root).get(0);
                        if (mid instanceof POForEach) {
                            rtn = true;
                        }
                    } else {
                        rtn = true;
                    }
                }
            }
        }
        return rtn;
    }

isSplitteeMergeable方法

private boolean isSplitteeMergeable(MapReduceOper splittee) {
        if (splittee.isGlobalSort() || splittee.isLimitAfterSort()) {
            log.info("Cannot merge this splittee: " +
                    "it is global sort or limit after sort");
            return false;
        }

        PhysicalOperator leaf = splittee.mapPlan.getLeaves().get(0);
        if (!(leaf instanceof POLocalRearrange) &&
                ! (leaf instanceof POSplit)) {
            log.info("Cannot merge this splittee: " +
                    "its map plan doesn't end with LR or Split operator: "
                    + leaf.getClass().getName());
            return false;
        }

        if (splittee.needsDistinctCombiner()) {
            log.info("Cannot merge this splittee: " +
                    "it has distinct combiner.");
            return false;
        }

        return true;
    }

if (splittee.isGlobalSort() || splittee.isLimitAfterSort())
判断排序类别，不能全局排序或逐个排序限制，它们使用的是不同的分区程序

if (!(leaf instanceof POLocalRearrange) &&! (leaf instanceof POSplit))
检查计划叶：仅合并本地重新排列或拆分

if (splittee.needsDistinctCombiner())
不能有不同的组合器，它使用不同的组合器

mergeMapReduceSplittees方法

拆分器具有非空还原器，因此我们无法将MR拆分器合并到拆分器中。我们要做的是将多个拆分器（如果存在）合并到新的 MR 运算符中，并将其连接到拆分器。

private int mergeMapReduceSplittees(List<MapReduceOper> mapReducers,
            MapReduceOper splitter) throws VisitorException {

        List<MapReduceOper> mergeList = getMergeList(splitter, mapReducers);

        if (mergeList.size() <= 1) {
            return  0;
        }

        MapReduceOper mrOper = getMROper();

        MapReduceOper splittee = mergeList.get(0);
        PhysicalPlan pl = splittee.mapPlan;
        POLoad load = (POLoad)pl.getRoots().get(0);

        mrOper.mapPlan.add(load);
        try {
            mrOper.mapPlan.addAsLeaf(getStore());
        } catch (PlanException e) {
            int errCode = 2137;
            String msg = "Internal Error. Unable to add store to the plan as leaf for optimization.";
            throw new OptimizerException(msg, errCode, PigException.BUG, e);
        }

        try {
            getPlan().add(mrOper);
            getPlan().connect(splitter, mrOper);
        } catch (PlanException e) {
            int errCode = 2133;
            String msg = "Internal Error. Unable to connect splitter with successors for optimization.";
            throw new OptimizerException(msg, errCode, PigException.BUG, e);
        }

        mergeAllMapReduceSplittees(mergeList, mrOper, getSplit());

        return (mergeList.size() - 1);
    }

try {
mrOper.mapPlan.addAsLeaf(getStore());
} catch (PlanException e) {
int errCode = 2137;
String msg = “Internal Error. Unable to add store to the plan as leaf for optimization.”;
throw new OptimizerException(msg, errCode, PigException.BUG, e);
}
添加一个虚拟存储运算符，稍后将由拆分运算符替换。

try {
getPlan().add(mrOper);
getPlan().connect(splitter, mrOper);
} catch (PlanException e) {
int errCode = 2133;
String msg = “Internal Error. Unable to connect splitter with successors for optimization.”;
throw new OptimizerException(msg, errCode, PigException.BUG, e);
}
将新的 MR 操作连接到分配器

mergeAllMapReduceSplittees(mergeList, mrOper, getSplit());
将拆分者合并到新的 MR 操作中