山东大学软件工程应用与实践——Pig代码分析(十二)

2021SC@SDUSC

概述
本次继续分析pig作为hadoop的轻量级脚本语言操作hadoop的executionengine包下的MultiQueryOptimizer类的代码。
该类继承自MROpPlanVisitor,用于创建一个优化器,它将全部或部分拆分器MapReduceOpers合并到拆分器MapReduceOper中。此优化器通过将 POLoad/POStore 组合替换为 POSplit 运算符来合并这些MapReduceOpers。

isDiamondMROper方法

当这个 MR 是一个无关的 MR 时,才会删除此 MR 作为菱形查询优化的一部分,也就是说,它的计划有两个运算符(加载后跟存储)或三个运算符(加载和存储之间的运算符必须是 foreach,由强制转换操作引入)。

private boolean isDiamondMROper(MapReduceOper mr) {
        boolean rtn = false;
        if (isMapOnly(mr)) {
            PhysicalPlan pl = mr.mapPlan;
            if (pl.size() == 2 || pl.size() == 3) {
                PhysicalOperator root = pl.getRoots().get(0);
                PhysicalOperator leaf = pl.getLeaves().get(0);
                if (root instanceof POLoad && leaf instanceof POStore) {
                    if (pl.size() == 3) {
                        PhysicalOperator mid = pl.getSuccessors(root).get(0);
                        if (mid instanceof POForEach) {
                            rtn = true;
                        }
                    } else {
                        rtn = true;
                    }
                }
            }
        }
        return rtn;
    }

isSplitteeMergeable方法

private boolean isSplitteeMergeable(MapReduceOper splittee) {
        if (splittee.isGlobalSort() || splittee.isLimitAfterSort()) {
            log.info("Cannot merge this splittee: " +
                    "it is global sort or limit after sort");
            return false;
        }

        PhysicalOperator leaf = splittee.mapPlan.getLeaves().get(0);
        if (!(leaf instanceof POLocalRearrange) &&
                ! (leaf instanceof POSplit)) {
            log.info("Cannot merge this splittee: " +
                    "its map plan doesn't end with LR or Split operator: "
                    + leaf.getClass().getName());
            return false;
        }

        if (splittee.needsDistinctCombiner()) {
            log.info("Cannot merge this splittee: " +
                    "it has distinct combiner.");
            return false;
        }

        return true;
    }

if (splittee.isGlobalSort() || splittee.isLimitAfterSort())
判断排序类别, 不能全局排序或逐个排序限制,它们使用的是不同的分区程序

if (!(leaf instanceof POLocalRearrange) &&! (leaf instanceof POSplit))
检查计划叶:仅合并本地重新排列或拆分

if (splittee.needsDistinctCombiner())
不能有不同的组合器,它使用不同的组合器

mergeMapReduceSplittees方法

拆分器具有非空还原器,因此我们无法将MR拆分器合并到拆分器中。我们要做的是将多个拆分器(如果存在)合并到新的 MR 运算符中,并将其连接到拆分器。

private int mergeMapReduceSplittees(List<MapReduceOper> mapReducers,
            MapReduceOper splitter) throws VisitorException {

        List<MapReduceOper> mergeList = getMergeList(splitter, mapReducers);

        if (mergeList.size() <= 1) {
            return  0;
        }

        MapReduceOper mrOper = getMROper();

        MapReduceOper splittee = mergeList.get(0);
        PhysicalPlan pl = splittee.mapPlan;
        POLoad load = (POLoad)pl.getRoots().get(0);

        mrOper.mapPlan.add(load);
        try {
            mrOper.mapPlan.addAsLeaf(getStore());
        } catch (PlanException e) {
            int errCode = 2137;
            String msg = "Internal Error. Unable to add store to the plan as leaf for optimization.";
            throw new OptimizerException(msg, errCode, PigException.BUG, e);
        }

        try {
            getPlan().add(mrOper);
            getPlan().connect(splitter, mrOper);
        } catch (PlanException e) {
            int errCode = 2133;
            String msg = "Internal Error. Unable to connect splitter with successors for optimization.";
            throw new OptimizerException(msg, errCode, PigException.BUG, e);
        }

        mergeAllMapReduceSplittees(mergeList, mrOper, getSplit());

        return (mergeList.size() - 1);
    }

try {
mrOper.mapPlan.addAsLeaf(getStore());
} catch (PlanException e) {
int errCode = 2137;
String msg = “Internal Error. Unable to add store to the plan as leaf for optimization.”;
throw new OptimizerException(msg, errCode, PigException.BUG, e);
}
添加一个虚拟存储运算符,稍后将由拆分运算符替换。

try {
getPlan().add(mrOper);
getPlan().connect(splitter, mrOper);
} catch (PlanException e) {
int errCode = 2133;
String msg = “Internal Error. Unable to connect splitter with successors for optimization.”;
throw new OptimizerException(msg, errCode, PigException.BUG, e);
}
将新的 MR 操作连接到分配器

mergeAllMapReduceSplittees(mergeList, mrOper, getSplit());
将拆分者合并到新的 MR 操作中

总结

本学期的Apache Pig的代码分析到此为止,总的来说,在此次代码分析中对于这种pig这种分析数据集的工具类的后端有了更加清晰的认识。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值