计算引擎CodeGen

最新推荐文章于 2024-08-05 17:58:43 发布

不甚了然

最新推荐文章于 2024-08-05 17:58:43 发布

阅读量509

点赞数

分类专栏： Flink知识集文章标签： flink

本文链接：https://blog.csdn.net/blackjjcat/article/details/131647166

版权

Flink知识集专栏收录该内容

26 篇文章 2 订阅

订阅专栏

1、解释执行和编译执行

解释执行：生成的是非机器码，需要通过中间的解释器翻译成机器码然后执行，一行一行读取翻译，所以执行效率低

编译执行：直接生成机器码，执行效率高，但编程难度也高，而且跨平台能力差

2、JIT

JIT（just in time），就是即时编译

Java其实是同时包含解释执行和编译执行的，编译执行利用的是JIT技术

当JVM发现某个方法或代码块运行特别频繁的时候，就会认为这是“热点代码”（Hot Spot Code)。JVM虚拟机会通过JIT把“热点代码”编译成本地机器码并进行优化，后续直接使用这段机器码运行，运行效率会更高（编译生成的指令存储在方法区的CodeCache）

2.1 前端编译和后端编译

前端编译就是与源有关的过程，即java文件变为class文件的过程

后端编译就是与目标有关的过程，即class文件到机器码的过程，JIT就属于后端编译

2.2 为什么不直接全部使用JIT编译全部代码？

首先，如果这段代码本身只会被执行一次，那编译就是在浪费，因为将代码翻译成java字节码相对于编译这段代码并执行来说要快很多

其次是最优化，当 JVM 执行某一方法或遍历循环的次数越多，就会更加了解代码结构，那么 JVM 在编译代码的时候就做出相应的优化

2.3 热点判断

JVM判断热点是通过计数器完成的，主要有两个计数器，方法调用计数器和循环体调用计数器

计数器并不是一直累加的，会有时间限定和周期衰减，有相应的JVM参数可以设置

3 Flink CodeGen

Flink目前在Flink SQL中已经使用了CodeGen，不过目的不是做CPU性能的优化，而是做Physical Plan到Transformations的转换；性能优化的特性已经在最新版本中开发了

以CommonExecCalc举例来说，translateToPlanInternal方法中调用CodeGen转换

final CodeGenOperatorFactory<RowData> substituteStreamOperator =
        CalcCodeGenerator.generateCalcOperator(
                ctx,
                inputTransform,
                (RowType) getOutputType(),
                JavaScalaConversionUtil.toScala(projection),
                JavaScalaConversionUtil.toScala(Optional.ofNullable(this.condition)),
                retainHeader,
                getClass().getSimpleName());

CalcCodeGenerator就存在如下生成代码的过程，就是其中的字符串

if (onlyFilter) {
  s"""
     |${if (eagerInputUnboxingCode) ctx.reuseInputUnboxingCode() else ""}
     |${filterCondition.code}
     |if (${filterCondition.resultTerm}) {
     |  ${produceOutputCode(inputTerm)}
     |}
     |""".stripMargin

在下一步的产生Operator的接口里可以看到更明显的类似平时写的业务代码的内容

val operatorCode =
  j"""
  public class $operatorName extends ${abstractBaseClass.getCanonicalName}
      implements ${baseClass.getCanonicalName}$endInputImpl {

    private final Object[] references;
    ${ctx.reuseMemberCode()}

    public $operatorName(
        Object[] references,
        ${className[StreamTask[_, _]]} task,
        ${className[StreamConfig]} config,
        ${className[Output[_]]} output,
        ${className[ProcessingTimeService]} processingTimeService) throws Exception {
      this.references = references;
      ${ctx.reuseInitCode()}
      this.setup(task, config, output);
      if (this instanceof ${className[AbstractStreamOperator[_]]}) {
        ((${className[AbstractStreamOperator[_]]}) this)
          .setProcessingTimeService(processingTimeService);
      }
    }

    @Override
    public void open() throws Exception {
      super.open();
      ${ctx.reuseOpenCode()}
    }

    @Override
    public void processElement($STREAM_RECORD $ELEMENT) throws Exception {
      $inputTypeTerm $inputTerm = ($inputTypeTerm) ${converter(s"$ELEMENT.getValue()")};
      ${ctx.reusePerRecordCode()}
      ${ctx.reuseLocalVariableCode()}
      ${if (lazyInputUnboxingCode) "" else ctx.reuseInputUnboxingCode()}
      $processCode
    }

    $endInput

    @Override
    public void finish() throws Exception {
        ${ctx.reuseFinishCode()}
        super.finish();
    }

    @Override
    public void close() throws Exception {
       super.close();
       ${ctx.reuseCloseCode()}
    }

    ${ctx.reuseInnerClassDefinitionCode()}
  }
""".stripMargin

代码的编译在CodeGenOperatorFactory当中，前面的operatorCode封装成GeneratedOperator，然后会传入CodeGenOperatorFactory作为成员，generatedClass

new GeneratedOperator(operatorName, operatorCode, ctx.references.toArray, ctx.tableConfig)

在CodeGenOperatorFactory的createStreamOperator接口当中，会编译并创建类的实例

public <T extends StreamOperator<OUT>> T createStreamOperator(
        StreamOperatorParameters<OUT> parameters) {
    return (T)
            generatedClass.newInstance(
                    parameters.getContainingTask().getUserCodeClassLoader(),
                    generatedClass.getReferences(),
                    parameters.getContainingTask(),
                    parameters.getStreamConfig(),
                    parameters.getOutput(),
                    processingTimeService);
}

newInstance里进行编译

public T newInstance(ClassLoader classLoader, Object... args) {
    try {
        return (T) compile(classLoader).getConstructors()[0].newInstance(args);
    } catch (Exception e) {
        throw new RuntimeException(
                "Could not instantiate generated class '" + className + "'", e);
    }
}

底层编译器使用的是janino

private static <T> Class<T> doCompile(ClassLoader cl, String name, String code) {
    checkNotNull(cl, "Classloader must not be null.");
    CODE_LOG.debug("Compiling: {} \n\n Code:\n{}", name, code);
    SimpleCompiler compiler = new SimpleCompiler();
    compiler.setParentClassLoader(cl);
    try {
        compiler.cook(code);
    } catch (Throwable t) {
        System.out.println(addLineNumber(code));
        throw new InvalidProgramException(
                "Table program cannot be compiled. This is a bug. Please file an issue.", t);
    }
    try {
        //noinspection unchecked
        return (Class<T>) compiler.getClassLoader().loadClass(name);
    } catch (ClassNotFoundException e) {
        throw new RuntimeException("Can not load class " + name, e);
    }
}

4 Spark CodeGen

Spark CodeGen可以分为两类，一是和上面Flink一样，做SQL类型向作业代码转换的；二是wholeStage，这一块是把多个RDD联合进行CodeGen代码生成，用以进行CPU性能的提升

优势：1、用for循环代替了迭代器，完全消除了虚函数调用；2、中间数据都保存在寄存器里，数据访问块

不足：生成的代码仍然是JVM体系的，所以性能仍然达不到native的速度，且无法直接应用SIMD等CPU的高性能机制

4.1 CollapseCodegenStages

WholeStage的起始，在QueryExecution当中设置优化规则，其中就有CollapseCodegenStages

// `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to guarantee the same
// number of partitions when instantiating PartitioningCollection.
RemoveRedundantSorts,
DisableUnnecessaryBucketedScan,
ApplyColumnarRulesAndInsertTransitions(
  sparkSession.sessionState.columnarRules, outputsColumnar = false),
CollapseCodegenStages()) ++

其apply根据配置spark.sql.codegen.wholeStage（默认true）选择添加WholeStageCodegenExec节点

def apply(plan: SparkPlan): SparkPlan = {
  if (conf.wholeStageEnabled) {
    insertWholeStageCodegen(plan)
  } else {
    plan
  }
}

WholeStage当中会有两个特殊的节点：WholeStageCodegenExec和InputAdapter，WholeStageCodegenExec是最核心的类

算子树节点并不是全支持CodeGen处理的，所以CollapseCodegenStages规则处理时，会分策略对两种节点进行处理。以是否支持CodeGen为分界线，连续的支持CodeGen的节点为一个整体，顶部会加入WholeStageCodegenExec，负责这一整块的CodeGen生成；当从支持的节点碰到不支持的节点时，会加入一个InputAdapter节点，相当于一个WholeStageCodegenExec块的叶子节点

4.2 CodegenSupport

是SparkPlan的子类，代表支持CodeGen，有一系列的子类，基本都是以*Exec命名的，就是对应SQL节点的物理计划（比如Filter节点对应FilterExec），其中的核心就是produce/doProduce和consume/doConsume两组接口，每个实现类各自实现了接口

WholeStageCodegenExec的整个CodeGen代码产生流程如下，就是由父节点开始，produce不断地向下调用子节点，然后再由子节点逆向调用父节点的consume，最终完成整个代码的生成

CodeGen需要产生的代码就在这两组接口当中定义

*   WholeStageCodegen       Plan A               FakeInput        Plan B
* =========================================================================
*
* -> execute()
*     |
*  doExecute() --------->   inputRDDs() -------> inputRDDs() ------> execute()
*     |
*     +----------------->   produce()
*                             |
*                          doProduce()  -------> produce()
*                                                   |
*                                                doProduce()
*                                                   |
*                         doConsume() <--------- consume()
*                             |
*  doConsume()  <--------  consume()

produce在父类CodegenSupport当中定义

final def produce(ctx: CodegenContext, parent: CodegenSupport): String = executeQuery {
  this.parent = parent
  ctx.freshNamePrefix = variablePrefix
  s"""
     |${ctx.registerComment(s"PRODUCE: ${this.simpleString(conf.maxToStringFields)}")}
     |${doProduce(ctx)}
   """.stripMargin
}

doProduce由子类实现，是真正实现CodeGen代码产生的地方，目前看大部分子类都是调用子节点的produce接口

protected override def doProduce(ctx: CodegenContext): String = {
  child.asInstanceOf[CodegenSupport].produce(ctx, this)
}

4.3 WholeStageCodegenExec

doExecute是执行入口，首先调用doCodeGen来产生代码，产生的代码需要编译，如果编译失败，就会回退使用非CodeGen的方式执行

override def doExecute(): RDD[InternalRow] = {
  val (ctx, cleanedSource) = doCodeGen()
  // try to compile and fallback if it failed
  val (_, compiledCodeStats) = try {
    CodeGenerator.compile(cleanedSource)
  } catch {
    case NonFatal(_) if !Utils.isTesting && conf.codegenFallback =>
      // We should already saw the error message
      logWarning(s"Whole-stage codegen disabled for plan (id=$codegenStageId):\n $treeString")
      return child.execute()
  }

除了编译失败，还有一个函数体长度校验的，如果超过限制了，也会回退

// Check if compiled code has a too large function
if (compiledCodeStats.maxMethodCodeSize > conf.hugeMethodLimit) {
  logInfo(s"Found too long generated codes and JIT optimization might not work: " +
    s"the bytecode size (${compiledCodeStats.maxMethodCodeSize}) is above the limit " +
    s"${conf.hugeMethodLimit}, and the whole-stage codegen was disabled " +
    s"for this plan (id=$codegenStageId). To avoid this, you can raise the limit " +
    s"`${SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key}`:\n$treeString")
  return child.execute()
}

Spark最终运行还是基于RDD，在代码编译完成之后，就是往RDD转换了

// Even though rdds is an RDD[InternalRow] it may actually be an RDD[ColumnarBatch] with
// type erasure hiding that. This allows for the input to a code gen stage to be columnar,
// but the output must be rows.
val rdds = child.asInstanceOf[CodegenSupport].inputRDDs()

RDD这边也有限制，最多支持两个输入

assert(rdds.size <= 2, "Up to two input RDDs can be supported")
if (rdds.length == 1) {
  rdds.head.mapPartitionsWithIndex { (index, iter) =>
    val (clazz, _) = CodeGenerator.compile(cleanedSource)
    val buffer = clazz.generate(references).asInstanceOf[BufferedRowIterator]
    buffer.init(index, Array(iter))
    new Iterator[InternalRow] {
      override def hasNext: Boolean = {
        val v = buffer.hasNext
        if (!v) durationMs += buffer.durationMs()
        v
      }
      override def next: InternalRow = buffer.next()
    }
  }
} else {
  // Right now, we support up to two input RDDs.
  rdds.head.zipPartitions(rdds(1)) { (leftIter, rightIter) =>
    Iterator((leftIter, rightIter))
    // a small hack to obtain the correct partition index
  }.mapPartitionsWithIndex { (index, zippedIter) =>
    val (leftIter, rightIter) = zippedIter.next()
    val (clazz, _) = CodeGenerator.compile(cleanedSource)
    val buffer = clazz.generate(references).asInstanceOf[BufferedRowIterator]
    buffer.init(index, Array(leftIter, rightIter))
    new Iterator[InternalRow] {
      override def hasNext: Boolean = {
        val v = buffer.hasNext
        if (!v) durationMs += buffer.durationMs()
        v
      }
      override def next: InternalRow = buffer.next()
    }
  }
}

4.3.1 doCodeGen

首先调用子节点的produce，按照前面的代码生成流程，去产生代码

val code = child.asInstanceOf[CodegenSupport].produce(ctx, this)

产生的代码用于数据处理，会放入最终代码的processNext接口当中

// main next function.
ctx.addNewFunction("processNext",
  s"""
    protected void processNext() throws java.io.IOException {
      ${code.trim}
    }
   """, inlineToOuterClass = true)

之后去构建整个GeneratedIterator的类代码，这里generatedClassName获取的一般是GeneratedIterator，最终执行用的应该就是这个GeneratedIterator

val className = generatedClassName()

val source = s"""
  public Object generate(Object[] references) {
    return new $className(references);
  }

  ${ctx.registerComment(
    s"""Codegened pipeline for stage (id=$codegenStageId) |${this.treeString.trim}""".stripMargin,
     "wsc_codegenPipeline")}
  ${ctx.registerComment(s"codegenStageId=$codegenStageId", "wsc_codegenStageId", true)}
  final class $className extends ${classOf[BufferedRowIterator].getName} {

    private Object[] references;
    private scala.collection.Iterator[] inputs;
    ${ctx.declareMutableStates()}

    public $className(Object[] references) {
      this.references = references;
    }

    public void init(int index, scala.collection.Iterator[] inputs) {
      partitionIndex = index;
      this.inputs = inputs;
      ${ctx.initMutableStates()}
      ${ctx.initPartition()}
    }

    ${ctx.emitExtraCode()}

    ${ctx.declareAddedFunctions()}
  }
  """.trim

4.3.2 生成code

生成code就是前面讲的从produce向下层调用再回到consume的过程，以最简单的select … from … where来说，流程如下
在这里插入图片描述

前面两层的doProduce都是直接调用下层的produce，直到FileSourceScanExec，他的doProduce使用的父类InputRDDCodegen的接口，核心就是生成上图中的while循环代码，代码中调用了consume逆向回调上层算子

s"""
   | while ($limitNotReachedCond $input.hasNext()) {
   |   InternalRow $row = (InternalRow) $input.next();
   |   ${updateNumOutputRowsMetrics}
   |   ${consume(ctx, outputVars, if (createUnsafeProjection) null else row).trim}
   |   ${shouldStopCheckCode}
   | }
 """.stripMargin

consume使用的是父接口CodegenSupport接口，前面说过函数体太长会回退非CodeGen模式，所以这里consume调父接口时会有一个判断，进行函数的拆分截断，整体逻辑还是调用父算子的接口

// Under certain conditions, we can put the logic to consume the rows of this operator into
// another function. So we can prevent a generated function too long to be optimized by JIT.
// The conditions:
// 1. The config "spark.sql.codegen.splitConsumeFuncByOperator" is enabled.
// 2. `inputVars` are all materialized. That is guaranteed to be true if the parent plan uses
//    all variables in output (see `requireAllOutput`).
// 3. The number of output variables must less than maximum number of parameters in Java method
//    declaration.
val confEnabled = conf.wholeStageSplitConsumeFuncByOperator
val requireAllOutput = output.forall(parent.usedInputs.contains(_))
val paramLength = CodeGenerator.calculateParamLength(output) + (if (row != null) 1 else 0)
val consumeFunc = if (confEnabled && requireAllOutput
    && CodeGenerator.isValidParamLength(paramLength)) {
  constructDoConsumeFunction(ctx, inputVars, row)
} else {
  parent.doConsume(ctx, inputVars, rowVar)
}

父类接口调用结束生成代码结构

s"""
   |${ctx.registerComment(s"CONSUME: ${parent.simpleString(conf.maxToStringFields)}")}
   |$evaluated
   |$consumeFunc
 """.stripMargin

父类FilterExec的doConsume当中构建了自身的逻辑并调用consume接口

// Note: wrap in "do { } while(false);", so the generated checks can jump out with "continue;"
s"""
   |do {
   |  $predicateCode
   |  $numOutput.add(1);
   |  ${consume(ctx, resultVars)}
   |} while(false);
 """.stripMargin

consume还是上面CodegenSupport的逻辑，调用ProjectExec

// Evaluation of non-deterministic expressions can't be deferred.
val nonDeterministicAttrs = projectList.filterNot(_.deterministic).map(_.toAttribute)
s"""
   |// common sub-expressions
   |${evaluateVariables(localValInputs)}
   |$subExprsCode
   |${evaluateRequiredVariables(output, resultVars, AttributeSet(nonDeterministicAttrs))}
   |${consume(ctx, resultVars)}
 """.stripMargin

4.4 编译

前面说过，生成代码以后会先进行编译，编译使用的也是janino

Janino 是一个极小、极快的开源Java 编译器（Janino is a super-small, super-fast Java™ compiler.）。Janino 不仅可以像 JAVAC 一样将 Java 源码文件编译为字节码文件，还可以编译内存中的 Java 表达式、块、类和源码文件，加载字节码并在JVM中直接执行，Janino不作为开发工具，而是运行时的嵌入式编译器

官方的用法示例如下，基本上就是调用Janino的API直接编译字符串代码

public class Main {
 
    public static void main(String[] args) throws CompileException, NumberFormatException, InvocationTargetException {
 
        ScriptEvaluator se = new ScriptEvaluator();
 
        se.cook(
            ""
            + "static void method1() {\n"
            + "    System.out.println(1);\n"
            + "}\n"
            + "\n"
            + "method1();\n"
            + "method2();\n"
            + "\n"
            + "static void method2() {\n"
            + "    System.out.println(2);\n"
            + "}\n"
        );
 
        se.evaluate();
    }
}