计算引擎CodeGen

1、解释执行和编译执行

  解释执行:生成的是非机器码,需要通过中间的解释器翻译成机器码然后执行,一行一行读取翻译,所以执行效率低

  编译执行:直接生成机器码,执行效率高,但编程难度也高,而且跨平台能力差

2、JIT

  JIT(just in time),就是即时编译

  Java其实是同时包含解释执行和编译执行的,编译执行利用的是JIT技术

  当JVM发现某个方法或代码块运行特别频繁的时候,就会认为这是“热点代码”(Hot Spot Code)。JVM虚拟机会通过JIT把“热点代码”编译成本地机器码并进行优化,后续直接使用这段机器码运行,运行效率会更高(编译生成的指令存储在方法区的CodeCache)

2.1 前端编译和后端编译

  前端编译就是与源有关的过程,即java文件变为class文件的过程

  后端编译就是与目标有关的过程,即class文件到机器码的过程,JIT就属于后端编译

2.2 为什么不直接全部使用JIT编译全部代码?

  首先,如果这段代码本身只会被执行一次,那编译就是在浪费,因为将代码翻译成java字节码相对于编译这段代码并执行来说要快很多

  其次是最优化,当 JVM 执行某一方法或遍历循环的次数越多,就会更加了解代码结构,那么 JVM 在编译代码的时候就做出相应的优化

2.3 热点判断

  JVM判断热点是通过计数器完成的,主要有两个计数器,方法调用计数器和循环体调用计数器

  计数器并不是一直累加的,会有时间限定和周期衰减,有相应的JVM参数可以设置

3 Flink CodeGen

  Flink目前在Flink SQL中已经使用了CodeGen,不过目的不是做CPU性能的优化,而是做Physical Plan到Transformations的转换;性能优化的特性已经在最新版本中开发了

  以CommonExecCalc举例来说,translateToPlanInternal方法中调用CodeGen转换

final CodeGenOperatorFactory<RowData> substituteStreamOperator =
        CalcCodeGenerator.generateCalcOperator(
                ctx,
                inputTransform,
                (RowType) getOutputType(),
                JavaScalaConversionUtil.toScala(projection),
                JavaScalaConversionUtil.toScala(Optional.ofNullable(this.condition)),
                retainHeader,
                getClass().getSimpleName());

  CalcCodeGenerator就存在如下生成代码的过程,就是其中的字符串

if (onlyFilter) {
  s"""
     |${if (eagerInputUnboxingCode) ctx.reuseInputUnboxingCode() else ""}
     |${filterCondition.code}
     |if (${filterCondition.resultTerm}) {
     |  ${produceOutputCode(inputTerm)}
     |}
     |""".stripMargin

  在下一步的产生Operator的接口里可以看到更明显的类似平时写的业务代码的内容

val operatorCode =
  j"""
  public class $operatorName extends ${abstractBaseClass.getCanonicalName}
      implements ${baseClass.getCanonicalName}$endInputImpl {

    private final Object[] references;
    ${ctx.reuseMemberCode()}

    public $operatorName(
        Object[] references,
        ${className[StreamTask[_, _]]} task,
        ${className[StreamConfig]} config,
        ${className[Output[_]]} output,
        ${className[ProcessingTimeService]} processingTimeService) throws Exception {
      this.references = references;
      ${ctx.reuseInitCode()}
      this.setup(task, config, output);
      if (this instanceof ${className[AbstractStreamOperator[_]]}) {
        ((${className[AbstractStreamOperator[_]]}) this)
          .setProcessingTimeService(processingTimeService);
      }
    }

    @Override
    public void open() throws Exception {
      super.open();
      ${ctx.reuseOpenCode()}
    }

    @Override
    public void processElement($STREAM_RECORD $ELEMENT) throws Exception {
      $inputTypeTerm $inputTerm = ($inputTypeTerm) ${converter(s"$ELEMENT.getValue()")};
      ${ctx.reusePerRecordCode()}
      ${ctx.reuseLocalVariableCode()}
      ${if (lazyInputUnboxingCode) "" else ctx.reuseInputUnboxingCode()}
      $processCode
    }

    $endInput

    @Override
    public void finish() throws Exception {
        ${ctx.reuseFinishCode()}
        super.finish();
    }

    @Override
    public void close() throws Exception {
       super.close();
       ${ctx.reuseCloseCode()}
    }

    ${ctx.reuseInnerClassDefinitionCode()}
  }
""".stripMargin

  代码的编译在CodeGenOperatorFactory当中,前面的operatorCode封装成GeneratedOperator,然后会传入CodeGenOperatorFactory作为成员,generatedClass

new GeneratedOperator(operatorName, operatorCode, ctx.references.toArray, ctx.tableConfig)

  在CodeGenOperatorFactory的createStreamOperator接口当中,会编译并创建类的实例

public <T extends StreamOperator<OUT>> T createStreamOperator(
        StreamOperatorParameters<OUT> parameters) {
    return (T)
            generatedClass.newInstance(
                    parameters.getContainingTask().getUserCodeClassLoader(),
                    generatedClass.getReferences(),
                    parameters.getContainingTask(),
                    parameters.getStreamConfig(),
                    parameters.getOutput(),
                    processingTimeService);
}

  newInstance里进行编译

public T newInstance(ClassLoader classLoader, Object... args) {
    try {
        return (T) compile(classLoader).getConstructors()[0].newInstance(args);
    } catch (Exception e) {
        throw new RuntimeException(
                "Could not instantiate generated class '" + className + "'", e);
    }
}

  底层编译器使用的是janino

private static <T> Class<T> doCompile(ClassLoader cl, String name, String code) {
    checkNotNull(cl, "Classloader must not be null.");
    CODE_LOG.debug("Compiling: {} \n\n Code:\n{}", name, code);
    SimpleCompiler compiler = new SimpleCompiler();
    compiler.setParentClassLoader(cl);
    try {
        compiler.cook(code);
    } catch (Throwable t) {
        System.out.println(addLineNumber(code));
        throw new InvalidProgramException(
                "Table program cannot be compiled. This is a bug. Please file an issue.", t);
    }
    try {
        //noinspection unchecked
        return (Class<T>) compiler.getClassLoader().loadClass(name);
    } catch (ClassNotFoundException e) {
        throw new RuntimeException("Can not load class " + name, e);
    }
}

4 Spark CodeGen

  Spark CodeGen可以分为两类,一是和上面Flink一样,做SQL类型向作业代码转换的;二是wholeStage,这一块是把多个RDD联合进行CodeGen代码生成,用以进行CPU性能的提升

  优势:1、用for循环代替了迭代器,完全消除了虚函数调用;2、中间数据都保存在寄存器里,数据访问块

  不足:生成的代码仍然是JVM体系的,所以性能仍然达不到native的速度,且无法直接应用SIMD等CPU的高性能机制

4.1 CollapseCodegenStages

  WholeStage的起始,在QueryExecution当中设置优化规则,其中就有CollapseCodegenStages

// `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to guarantee the same
// number of partitions when instantiating PartitioningCollection.
RemoveRedundantSorts,
DisableUnnecessaryBucketedScan,
ApplyColumnarRulesAndInsertTransitions(
  sparkSession.sessionState.columnarRules, outputsColumnar = false),
CollapseCodegenStages()) ++

  其apply根据配置spark.sql.codegen.wholeStage(默认true)选择添加WholeStageCodegenExec节点

def apply(plan: SparkPlan): SparkPlan = {
  if (conf.wholeStageEnabled) {
    insertWholeStageCodegen(plan)
  } else {
    plan
  }
}

  WholeStage当中会有两个特殊的节点:WholeStageCodegenExec和InputAdapter,WholeStageCodegenExec是最核心的类

  算子树节点并不是全支持CodeGen处理的,所以CollapseCodegenStages规则处理时,会分策略对两种节点进行处理。以是否支持CodeGen为分界线,连续的支持CodeGen的节点为一个整体,顶部会加入WholeStageCodegenExec,负责这一整块的CodeGen生成;当从支持的节点碰到不支持的节点时,会加入一个InputAdapter节点,相当于一个WholeStageCodegenExec块的叶子节点

4.2 CodegenSupport

  是SparkPlan的子类,代表支持CodeGen,有一系列的子类,基本都是以*Exec命名的,就是对应SQL节点的物理计划(比如Filter节点对应FilterExec),其中的核心就是produce/doProduce和consume/doConsume两组接口,每个实现类各自实现了接口

  WholeStageCodegenExec的整个CodeGen代码产生流程如下,就是由父节点开始,produce不断地向下调用子节点,然后再由子节点逆向调用父节点的consume,最终完成整个代码的生成

  CodeGen需要产生的代码就在这两组接口当中定义

*   WholeStageCodegen       Plan A               FakeInput        Plan B
* =========================================================================
*
* -> execute()
*     |
*  doExecute() --------->   inputRDDs() -------> inputRDDs() ------> execute()
*     |
*     +----------------->   produce()
*                             |
*                          doProduce()  -------> produce()
*                                                   |
*                                                doProduce()
*                                                   |
*                         doConsume() <--------- consume()
*                             |
*  doConsume()  <--------  consume()

  produce在父类CodegenSupport当中定义

final def produce(ctx: CodegenContext, parent: CodegenSupport): String = executeQuery {
  this.parent = parent
  ctx.freshNamePrefix = variablePrefix
  s"""
     |${ctx.registerComment(s"PRODUCE: ${this.simpleString(conf.maxToStringFields)}")}
     |${doProduce(ctx)}
   """.stripMargin
}

  doProduce由子类实现,是真正实现CodeGen代码产生的地方,目前看大部分子类都是调用子节点的produce接口

protected override def doProduce(ctx: CodegenContext): String = {
  child.asInstanceOf[CodegenSupport].produce(ctx, this)
}

4.3 WholeStageCodegenExec

  doExecute是执行入口,首先调用doCodeGen来产生代码,产生的代码需要编译,如果编译失败,就会回退使用非CodeGen的方式执行

override def doExecute(): RDD[InternalRow] = {
  val (ctx, cleanedSource) = doCodeGen()
  // try to compile and fallback if it failed
  val (_, compiledCodeStats) = try {
    CodeGenerator.compile(cleanedSource)
  } catch {
    case NonFatal(_) if !Utils.isTesting && conf.codegenFallback =>
      // We should already saw the error message
      logWarning(s"Whole-stage codegen disabled for plan (id=$codegenStageId):\n $treeString")
      return child.execute()
  }

  除了编译失败,还有一个函数体长度校验的,如果超过限制了,也会回退

// Check if compiled code has a too large function
if (compiledCodeStats.maxMethodCodeSize > conf.hugeMethodLimit) {
  logInfo(s"Found too long generated codes and JIT optimization might not work: " +
    s"the bytecode size (${compiledCodeStats.maxMethodCodeSize}) is above the limit " +
    s"${conf.hugeMethodLimit}, and the whole-stage codegen was disabled " +
    s"for this plan (id=$codegenStageId). To avoid this, you can raise the limit " +
    s"`${SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key}`:\n$treeString")
  return child.execute()
}

  Spark最终运行还是基于RDD,在代码编译完成之后,就是往RDD转换了

// Even though rdds is an RDD[InternalRow] it may actually be an RDD[ColumnarBatch] with
// type erasure hiding that. This allows for the input to a code gen stage to be columnar,
// but the output must be rows.
val rdds = child.asInstanceOf[CodegenSupport].inputRDDs()

  RDD这边也有限制,最多支持两个输入

assert(rdds.size <= 2, "Up to two input RDDs can be supported")
if (rdds.length == 1) {
  rdds.head.mapPartitionsWithIndex { (index, iter) =>
    val (clazz, _) = CodeGenerator.compile(cleanedSource)
    val buffer = clazz.generate(references).asInstanceOf[BufferedRowIterator]
    buffer.init(index, Array(iter))
    new Iterator[InternalRow] {
      override def hasNext: Boolean = {
        val v = buffer.hasNext
        if (!v) durationMs += buffer.durationMs()
        v
      }
      override def next: InternalRow = buffer.next()
    }
  }
} else {
  // Right now, we support up to two input RDDs.
  rdds.head.zipPartitions(rdds(1)) { (leftIter, rightIter) =>
    Iterator((leftIter, rightIter))
    // a small hack to obtain the correct partition index
  }.mapPartitionsWithIndex { (index, zippedIter) =>
    val (leftIter, rightIter) = zippedIter.next()
    val (clazz, _) = CodeGenerator.compile(cleanedSource)
    val buffer = clazz.generate(references).asInstanceOf[BufferedRowIterator]
    buffer.init(index, Array(leftIter, rightIter))
    new Iterator[InternalRow] {
      override def hasNext: Boolean = {
        val v = buffer.hasNext
        if (!v) durationMs += buffer.durationMs()
        v
      }
      override def next: InternalRow = buffer.next()
    }
  }
}

4.3.1 doCodeGen

  首先调用子节点的produce,按照前面的代码生成流程,去产生代码

val code = child.asInstanceOf[CodegenSupport].produce(ctx, this)

  产生的代码用于数据处理,会放入最终代码的processNext接口当中

// main next function.
ctx.addNewFunction("processNext",
  s"""
    protected void processNext() throws java.io.IOException {
      ${code.trim}
    }
   """, inlineToOuterClass = true)

  之后去构建整个GeneratedIterator的类代码,这里generatedClassName获取的一般是GeneratedIterator,最终执行用的应该就是这个GeneratedIterator

val className = generatedClassName()

val source = s"""
  public Object generate(Object[] references) {
    return new $className(references);
  }

  ${ctx.registerComment(
    s"""Codegened pipeline for stage (id=$codegenStageId) |${this.treeString.trim}""".stripMargin,
     "wsc_codegenPipeline")}
  ${ctx.registerComment(s"codegenStageId=$codegenStageId", "wsc_codegenStageId", true)}
  final class $className extends ${classOf[BufferedRowIterator].getName} {

    private Object[] references;
    private scala.collection.Iterator[] inputs;
    ${ctx.declareMutableStates()}

    public $className(Object[] references) {
      this.references = references;
    }

    public void init(int index, scala.collection.Iterator[] inputs) {
      partitionIndex = index;
      this.inputs = inputs;
      ${ctx.initMutableStates()}
      ${ctx.initPartition()}
    }

    ${ctx.emitExtraCode()}

    ${ctx.declareAddedFunctions()}
  }
  """.trim

4.3.2 生成code

  生成code就是前面讲的从produce向下层调用再回到consume的过程,以最简单的select … from … where来说,流程如下
在这里插入图片描述

  前面两层的doProduce都是直接调用下层的produce,直到FileSourceScanExec,他的doProduce使用的父类InputRDDCodegen的接口,核心就是生成上图中的while循环代码,代码中调用了consume逆向回调上层算子

s"""
   | while ($limitNotReachedCond $input.hasNext()) {
   |   InternalRow $row = (InternalRow) $input.next();
   |   ${updateNumOutputRowsMetrics}
   |   ${consume(ctx, outputVars, if (createUnsafeProjection) null else row).trim}
   |   ${shouldStopCheckCode}
   | }
 """.stripMargin

  consume使用的是父接口CodegenSupport接口,前面说过函数体太长会回退非CodeGen模式,所以这里consume调父接口时会有一个判断,进行函数的拆分截断,整体逻辑还是调用父算子的接口

// Under certain conditions, we can put the logic to consume the rows of this operator into
// another function. So we can prevent a generated function too long to be optimized by JIT.
// The conditions:
// 1. The config "spark.sql.codegen.splitConsumeFuncByOperator" is enabled.
// 2. `inputVars` are all materialized. That is guaranteed to be true if the parent plan uses
//    all variables in output (see `requireAllOutput`).
// 3. The number of output variables must less than maximum number of parameters in Java method
//    declaration.
val confEnabled = conf.wholeStageSplitConsumeFuncByOperator
val requireAllOutput = output.forall(parent.usedInputs.contains(_))
val paramLength = CodeGenerator.calculateParamLength(output) + (if (row != null) 1 else 0)
val consumeFunc = if (confEnabled && requireAllOutput
    && CodeGenerator.isValidParamLength(paramLength)) {
  constructDoConsumeFunction(ctx, inputVars, row)
} else {
  parent.doConsume(ctx, inputVars, rowVar)
}

  父类接口调用结束生成代码结构

s"""
   |${ctx.registerComment(s"CONSUME: ${parent.simpleString(conf.maxToStringFields)}")}
   |$evaluated
   |$consumeFunc
 """.stripMargin

  父类FilterExec的doConsume当中构建了自身的逻辑并调用consume接口

// Note: wrap in "do { } while(false);", so the generated checks can jump out with "continue;"
s"""
   |do {
   |  $predicateCode
   |  $numOutput.add(1);
   |  ${consume(ctx, resultVars)}
   |} while(false);
 """.stripMargin

  consume还是上面CodegenSupport的逻辑,调用ProjectExec

// Evaluation of non-deterministic expressions can't be deferred.
val nonDeterministicAttrs = projectList.filterNot(_.deterministic).map(_.toAttribute)
s"""
   |// common sub-expressions
   |${evaluateVariables(localValInputs)}
   |$subExprsCode
   |${evaluateRequiredVariables(output, resultVars, AttributeSet(nonDeterministicAttrs))}
   |${consume(ctx, resultVars)}
 """.stripMargin

4.4 编译

  前面说过,生成代码以后会先进行编译,编译使用的也是janino

  Janino 是一个极小、极快的开源Java 编译器(Janino is a super-small, super-fast Java™ compiler.)。Janino 不仅可以像 JAVAC 一样将 Java 源码文件编译为字节码文件,还可以编译内存中的 Java 表达式、块、类和源码文件,加载字节码并在JVM中直接执行,Janino不作为开发工具,而是运行时的嵌入式编译器

  官方的用法示例如下,基本上就是调用Janino的API直接编译字符串代码

public class Main {
 
    public static void main(String[] args) throws CompileException, NumberFormatException, InvocationTargetException {
 
        ScriptEvaluator se = new ScriptEvaluator();
 
        se.cook(
            ""
            + "static void method1() {\n"
            + "    System.out.println(1);\n"
            + "}\n"
            + "\n"
            + "method1();\n"
            + "method2();\n"
            + "\n"
            + "static void method2() {\n"
            + "    System.out.println(2);\n"
            + "}\n"
        );
 
        se.evaluate();
    }
}
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值