2021SC@SDUSC
总览
本篇为对LogicalPlanBuilder类进行分析的第二篇,LogicalPlanBuilder类主要分析的是在生成逻辑执行计划的过程中方法的调用。首先将语法树的各个节点转换为对应LogicalPlan节点,然后将生成解析后的逻辑算子树,最后将一系列优化规则应用在逻辑算子树中,确保结果正确的前提下改进低效结构,生成优化后的逻辑算子树.LogicalPlanBuilder负责构建Operator。Operator中包含名称、Schema、包装成LogicalExpressionPlan的运行参数等信息(以及requestedParallelism、mCustomPartitioner等用户自定义的Hadoop MapReduce运行配置信息)。
代码分析
getProjectExpList()
该方法的主要目的是得到项目的表达式列表,首先从读入的逻辑表达式列表中读取数据,然后因为ProjExpr最初连接到CubeOp,请将其重新附加到指定的运算符。然后将多维数据集运算符的前置运算符与foreach运算符连接,并将多维数据集运算符与其前置运算符断开连接,最后断开多维数据集操作符与计划的连接。
private List<LogicalExpression> getProjectExpList(List<LogicalExpressionPlan> lexpPlanList,
LogicalRelationalOperator lro) throws FrontendException {
List<LogicalExpression> leList = new ArrayList<LogicalExpression>();
for (int i = 0; i < lexpPlanList.size(); i++) {
LogicalExpressionPlan lexp = lexpPlanList.get(i);
LogicalExpression lex = (LogicalExpression) lexp.getSources().get(0);
Iterator<Operator> opers = lexp.getOperators();
while (opers.hasNext()) {
Operator oper = opers.next();
try {
((ProjectExpression) oper).setAttachedRelationalOp(lro);
} catch (ClassCastException cce) {
throw new FrontendException("Column project expected.", cce);
}
}
leList.add(lex);
}
return leList;
}
private void injectForeachOperator(SourceLocation loc, LOCube op, LOForEach foreach)
throws FrontendException {
// connect the foreach operator with predecessors of cube operator
List<Operator> opers = op.getPlan().getPredecessors(op);
for (Operator oper : opers) {
OperatorPlan foreachPlan = foreach.getPlan();
foreachPlan.connect(oper, (Operator) foreach);
}
opers = foreach.getPlan().getPredecessors(foreach);
for (Operator lop : opers) {
List<Operator> succs = lop.getPlan().getSuccessors(lop);
for (Operator succ : succs) {
if (succ instanceof LOCube) {
succ.getPlan().disconnect(lop, succ);
succ.getPlan().remove(succ);
}
}
}
}
combineCubeOperations()
该方法为连接多维数据集,如果连续发生多个多维数据集操作,则可以按多维数据集(a,b)、多维数据集(c,d)将其组合在一起按多维数据集(a、b、c、d)列出的多维数据集关系。扫描并执行列投影合并。当多维数据集操作序列在末尾发生时,如(多维数据集、汇总、多维数据集、多维数据集),需要进行此检查,在这种情况下,endIdx将大于startIdx,如果合并,则删除标记为要删除的列投影。
private void combineCubeOperations(ArrayList<String> operations,
MultiMap<Integer, LogicalExpressionPlan> expressionPlans) {
int startIdx = -1;
int endIdx = -1;
int i = 0;
boolean isMerged = false;
for (i = 0; i < operations.size(); i++) {
if ((startIdx == -1) && (operations.get(i).equals("CUBE") == true)) {
startIdx = i;
} else {
if (operations.get(i).equals("CUBE") == true) {
endIdx = i;
} else {
if (endIdx > startIdx) {
mergeAndMarkForDelete(operations, expressionPlans, startIdx, endIdx);
isMerged = true;
startIdx = -1;
endIdx = -1;
} else {
startIdx = -1;
endIdx = -1;
}
}
}
}
if (endIdx > startIdx) {
isMerged = true;
mergeAndMarkForDelete(operations, expressionPlans, startIdx, endIdx);
}
if (isMerged) {
performDeletion(expressionPlans, operations);
}
}
processExpressionPlan()
该方法主要是了解项目表达式。首先检查所有项目表达式,如果项目表达式的实例存在,如果列数为0,则说明项目正在使用列别名,如果项目操作数不为零,这意味着项目表达式引用关系将关系添加到嵌套中loggenerate和set投影输入的输入,如果为0,这意味着项目表达式引用遍历输入中的一列。反之添加LOInnerLoad并将其用作输入,项目表达式引用遍历输入中使用position的列。
private static void processExpressionPlan(LOForEach foreach,
LogicalPlan lp,
LogicalExpressionPlan plan,
ArrayList<Operator> inputs ) throws FrontendException {
Iterator<Operator> it = plan.getOperators();
while( it.hasNext() ) {
Operator sink = it.next();
//check all ProjectExpression
if( sink instanceof ProjectExpression ) {
ProjectExpression projExpr = (ProjectExpression)sink;
String colAlias = projExpr.getColAlias();
if( projExpr.isRangeProject()){
LOInnerLoad innerLoad = new LOInnerLoad( lp, foreach,
new ProjectExpression(projExpr, new LogicalExpressionPlan())
);
setupInnerLoadAndProj(innerLoad, projExpr, lp, inputs);
} else if( colAlias != null ) {
// the project is using a column alias
Operator op = projExpr.getProjectedOperator();
if( op != null ) {
// this means the project expression refers to a relation
// in the nested foreach
//add the relation to inputs of LOGenerate and set
// projection input
int index = inputs.indexOf( op );
if( index == -1 ) {
index = inputs.size();
inputs.add( op );
}
projExpr.setInputNum( index );
projExpr.setColNum( -1 );
} else {
// this means the project expression refers to a column
// in the input of foreach. Add a LOInnerLoad and use that
// as input
LOInnerLoad innerLoad = new LOInnerLoad( lp, foreach, colAlias );
setupInnerLoadAndProj(innerLoad, projExpr, lp, inputs);
}
} else {
// the project expression is referring to column in ForEach input
// using position (eg $1)
LOInnerLoad innerLoad = new LOInnerLoad( lp, foreach, projExpr.getColNum() );
setupInnerLoadAndProj(innerLoad, projExpr, lp, inputs);
}
}
}
}
总结
本次分析的代码是LogicalPlanBuilder类对多维数据及的合并,对表达式列表的获取部分。