1、解释执行和编译执行
解释执行:生成的是非机器码,需要通过中间的解释器翻译成机器码然后执行,一行一行读取翻译,所以执行效率低
编译执行:直接生成机器码,执行效率高,但编程难度也高,而且跨平台能力差
2、JIT
JIT(just in time),就是即时编译
Java其实是同时包含解释执行和编译执行的,编译执行利用的是JIT技术
当JVM发现某个方法或代码块运行特别频繁的时候,就会认为这是“热点代码”(Hot Spot Code)。JVM虚拟机会通过JIT把“热点代码”编译成本地机器码并进行优化,后续直接使用这段机器码运行,运行效率会更高(编译生成的指令存储在方法区的CodeCache)
2.1 前端编译和后端编译
前端编译就是与源有关的过程,即java文件变为class文件的过程
后端编译就是与目标有关的过程,即class文件到机器码的过程,JIT就属于后端编译
2.2 为什么不直接全部使用JIT编译全部代码?
首先,如果这段代码本身只会被执行一次,那编译就是在浪费,因为将代码翻译成java字节码相对于编译这段代码并执行来说要快很多
其次是最优化,当 JVM 执行某一方法或遍历循环的次数越多,就会更加了解代码结构,那么 JVM 在编译代码的时候就做出相应的优化
2.3 热点判断
JVM判断热点是通过计数器完成的,主要有两个计数器,方法调用计数器和循环体调用计数器
计数器并不是一直累加的,会有时间限定和周期衰减,有相应的JVM参数可以设置
3 Flink CodeGen
Flink目前在Flink SQL中已经使用了CodeGen,不过目的不是做CPU性能的优化,而是做Physical Plan到Transformations的转换;性能优化的特性已经在最新版本中开发了
以CommonExecCalc举例来说,translateToPlanInternal方法中调用CodeGen转换
final CodeGenOperatorFactory<RowData> substituteStreamOperator =
CalcCodeGenerator.generateCalcOperator(
ctx,
inputTransform,
(RowType) getOutputType(),
JavaScalaConversionUtil.toScala(projection),
JavaScalaConversionUtil.toScala(Optional.ofNullable(this.condition)),
retainHeader,
getClass().getSimpleName());
CalcCodeGenerator就存在如下生成代码的过程,就是其中的字符串
if (onlyFilter) {
s"""
|${if (eagerInputUnboxingCode) ctx.reuseInputUnboxingCode() else ""}
|${filterCondition.code}
|if (${filterCondition.resultTerm}) {
| ${produceOutputCode(inputTerm)}
|}
|""".stripMargin
在下一步的产生Operator的接口里可以看到更明显的类似平时写的业务代码的内容
val operatorCode =
j"""
public class $operatorName extends ${abstractBaseClass.getCanonicalName}
implements ${baseClass.getCanonicalName}$endInputImpl {
private final Object[] references;
${ctx.reuseMemberCode()}
public $operatorName(
Object[] references,
${className[StreamTask[_, _]]} task,
${className[StreamConfig]} config,
${className[Output[_]]} output,
${className[ProcessingTimeService]} processingTimeService) throws Exception {
this.references = references;
${ctx.reuseInitCode()}
this.setup(task, config, output);
if (this instanceof ${className[AbstractStreamOperator[_]]}) {
((${className[AbstractStreamOperator[_]]}) this)
.setProcessingTimeService(processingTimeService);
}
}
@Override
public void open() throws Exception {
super.open();
${ctx.reuseOpenCode()}
}
@Override
public void processElement($STREAM_RECORD $ELEMENT) throws Exception {
$inputTypeTerm $inputTerm = ($inputTypeTerm) ${converter(s"$ELEMENT.getValue()")};
${ctx.reusePerRecordCode()}
${ctx.reuseLocalVariableCode()}
${if (lazyInputUnboxingCode) "" else ctx.reuseInputUnboxingCode()}
$processCode
}
$endInput
@Override
public void finish() throws Exception {
${ctx.reuseFinishCode()}
super.finish();
}
@Override
public void close() throws Exception {
super.close();
${ctx.reuseCloseCode()}
}
${ctx.reuseInnerClassDefinitionCode()}
}
""".stripMargin
代码的编译在CodeGenOperatorFactory当中,前面的operatorCode封装成GeneratedOperator,然后会传入CodeGenOperatorFactory作为成员,generatedClass
new GeneratedOperator(operatorName, operatorCode, ctx.references.toArray, ctx.tableConfig)
在CodeGenOperatorFactory的createStreamOperator接口当中,会编译并创建类的实例
public <T extends StreamOperator<OUT>> T createStreamOperator(
StreamOperatorParameters<OUT> parameters) {
return (T)
generatedClass.newInstance(
parameters.getContainingTask().getUserCodeClassLoader(),
generatedClass.getReferences(),
parameters.getContainingTask(),
parameters.getStreamConfig(),
parameters.getOutput(),
processingTimeService);
}
newInstance里进行编译
public T newInstance(ClassLoader classLoader, Object... args) {
try {
return (T) compile(classLoader).getConstructors()[0].newInstance(args);
} catch (Exception e) {
throw new RuntimeException(
"Could not instantiate generated class '" + className + "'", e);
}
}
底层编译器使用的是janino
private static <T> Class<T> doCompile(ClassLoader cl, String name, String code) {
checkNotNull(cl, "Classloader must not be null.");
CODE_LOG.debug("Compiling: {} \n\n Code:\n{}", name, code);
SimpleCompiler compiler = new SimpleCompiler();
compiler.setParentClassLoader(cl);
try {
compiler.cook(code);
} catch (Throwable t) {
System.out.println(addLineNumber(code));
throw new InvalidProgramException(
"Table program cannot be compiled. This is a bug. Please file an issue.", t);
}
try {
//noinspection unchecked
return (Class<T>) compiler.getClassLoader().loadClass(name);
} catch (ClassNotFoundException e) {
throw new RuntimeException("Can not load class " + name, e);
}
}
4 Spark CodeGen
Spark CodeGen可以分为两类,一是和上面Flink一样,做SQL类型向作业代码转换的;二是wholeStage,这一块是把多个RDD联合进行CodeGen代码生成,用以进行CPU性能的提升
优势:1、用for循环代替了迭代器,完全消除了虚函数调用;2、中间数据都保存在寄存器里,数据访问块
不足:生成的代码仍然是JVM体系的,所以性能仍然达不到native的速度,且无法直接应用SIMD等CPU的高性能机制
4.1 CollapseCodegenStages
WholeStage的起始,在QueryExecution当中设置优化规则,其中就有CollapseCodegenStages
// `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to guarantee the same
// number of partitions when instantiating PartitioningCollection.
RemoveRedundantSorts,
DisableUnnecessaryBucketedScan,
ApplyColumnarRulesAndInsertTransitions(
sparkSession.sessionState.columnarRules, outputsColumnar = false),
CollapseCodegenStages()) ++
其apply根据配置spark.sql.codegen.wholeStage(默认true)选择添加WholeStageCodegenExec节点
def apply(plan: SparkPlan): SparkPlan = {
if (conf.wholeStageEnabled) {
insertWholeStageCodegen(plan)
} else {
plan
}
}
WholeStage当中会有两个特殊的节点:WholeStageCodegenExec和InputAdapter,WholeStageCodegenExec是最核心的类
算子树节点并不是全支持CodeGen处理的,所以CollapseCodegenStages规则处理时,会分策略对两种节点进行处理。以是否支持CodeGen为分界线,连续的支持CodeGen的节点为一个整体,顶部会加入WholeStageCodegenExec,负责这一整块的CodeGen生成;当从支持的节点碰到不支持的节点时,会加入一个InputAdapter节点,相当于一个WholeStageCodegenExec块的叶子节点
4.2 CodegenSupport
是SparkPlan的子类,代表支持CodeGen,有一系列的子类,基本都是以*Exec命名的,就是对应SQL节点的物理计划(比如Filter节点对应FilterExec),其中的核心就是produce/doProduce和consume/doConsume两组接口,每个实现类各自实现了接口
WholeStageCodegenExec的整个CodeGen代码产生流程如下,就是由父节点开始,produce不断地向下调用子节点,然后再由子节点逆向调用父节点的consume,最终完成整个代码的生成
CodeGen需要产生的代码就在这两组接口当中定义
* WholeStageCodegen Plan A FakeInput Plan B
* =========================================================================
*
* -> execute()
* |
* doExecute() ---------> inputRDDs() -------> inputRDDs() ------> execute()
* |
* +-----------------> produce()
* |
* doProduce() -------> produce()
* |
* doProduce()
* |
* doConsume() <--------- consume()
* |
* doConsume() <-------- consume()
produce在父类CodegenSupport当中定义
final def produce(ctx: CodegenContext, parent: CodegenSupport): String = executeQuery {
this.parent = parent
ctx.freshNamePrefix = variablePrefix
s"""
|${ctx.registerComment(s"PRODUCE: ${this.simpleString(conf.maxToStringFields)}")}
|${doProduce(ctx)}
""".stripMargin
}
doProduce由子类实现,是真正实现CodeGen代码产生的地方,目前看大部分子类都是调用子节点的produce接口
protected override def doProduce(ctx: CodegenContext): String = {
child.asInstanceOf[CodegenSupport].produce(ctx, this)
}
4.3 WholeStageCodegenExec
doExecute是执行入口,首先调用doCodeGen来产生代码,产生的代码需要编译,如果编译失败,就会回退使用非CodeGen的方式执行
override def doExecute(): RDD[InternalRow] = {
val (ctx, cleanedSource) = doCodeGen()
// try to compile and fallback if it failed
val (_, compiledCodeStats) = try {
CodeGenerator.compile(cleanedSource)
} catch {
case NonFatal(_) if !Utils.isTesting && conf.codegenFallback =>
// We should already saw the error message
logWarning(s"Whole-stage codegen disabled for plan (id=$codegenStageId):\n $treeString")
return child.execute()
}
除了编译失败,还有一个函数体长度校验的,如果超过限制了,也会回退
// Check if compiled code has a too large function
if (compiledCodeStats.maxMethodCodeSize > conf.hugeMethodLimit) {
logInfo(s"Found too long generated codes and JIT optimization might not work: " +
s"the bytecode size (${compiledCodeStats.maxMethodCodeSize}) is above the limit " +
s"${conf.hugeMethodLimit}, and the whole-stage codegen was disabled " +
s"for this plan (id=$codegenStageId). To avoid this, you can raise the limit " +
s"`${SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key}`:\n$treeString")
return child.execute()
}
Spark最终运行还是基于RDD,在代码编译完成之后,就是往RDD转换了
// Even though rdds is an RDD[InternalRow] it may actually be an RDD[ColumnarBatch] with
// type erasure hiding that. This allows for the input to a code gen stage to be columnar,
// but the output must be rows.
val rdds = child.asInstanceOf[CodegenSupport].inputRDDs()
RDD这边也有限制,最多支持两个输入
assert(rdds.size <= 2, "Up to two input RDDs can be supported")
if (rdds.length == 1) {
rdds.head.mapPartitionsWithIndex { (index, iter) =>
val (clazz, _) = CodeGenerator.compile(cleanedSource)
val buffer = clazz.generate(references).asInstanceOf[BufferedRowIterator]
buffer.init(index, Array(iter))
new Iterator[InternalRow] {
override def hasNext: Boolean = {
val v = buffer.hasNext
if (!v) durationMs += buffer.durationMs()
v
}
override def next: InternalRow = buffer.next()
}
}
} else {
// Right now, we support up to two input RDDs.
rdds.head.zipPartitions(rdds(1)) { (leftIter, rightIter) =>
Iterator((leftIter, rightIter))
// a small hack to obtain the correct partition index
}.mapPartitionsWithIndex { (index, zippedIter) =>
val (leftIter, rightIter) = zippedIter.next()
val (clazz, _) = CodeGenerator.compile(cleanedSource)
val buffer = clazz.generate(references).asInstanceOf[BufferedRowIterator]
buffer.init(index, Array(leftIter, rightIter))
new Iterator[InternalRow] {
override def hasNext: Boolean = {
val v = buffer.hasNext
if (!v) durationMs += buffer.durationMs()
v
}
override def next: InternalRow = buffer.next()
}
}
}
4.3.1 doCodeGen
首先调用子节点的produce,按照前面的代码生成流程,去产生代码
val code = child.asInstanceOf[CodegenSupport].produce(ctx, this)
产生的代码用于数据处理,会放入最终代码的processNext接口当中
// main next function.
ctx.addNewFunction("processNext",
s"""
protected void processNext() throws java.io.IOException {
${code.trim}
}
""", inlineToOuterClass = true)
之后去构建整个GeneratedIterator的类代码,这里generatedClassName获取的一般是GeneratedIterator,最终执行用的应该就是这个GeneratedIterator
val className = generatedClassName()
val source = s"""
public Object generate(Object[] references) {
return new $className(references);
}
${ctx.registerComment(
s"""Codegened pipeline for stage (id=$codegenStageId) |${this.treeString.trim}""".stripMargin,
"wsc_codegenPipeline")}
${ctx.registerComment(s"codegenStageId=$codegenStageId", "wsc_codegenStageId", true)}
final class $className extends ${classOf[BufferedRowIterator].getName} {
private Object[] references;
private scala.collection.Iterator[] inputs;
${ctx.declareMutableStates()}
public $className(Object[] references) {
this.references = references;
}
public void init(int index, scala.collection.Iterator[] inputs) {
partitionIndex = index;
this.inputs = inputs;
${ctx.initMutableStates()}
${ctx.initPartition()}
}
${ctx.emitExtraCode()}
${ctx.declareAddedFunctions()}
}
""".trim
4.3.2 生成code
生成code就是前面讲的从produce向下层调用再回到consume的过程,以最简单的select … from … where来说,流程如下
前面两层的doProduce都是直接调用下层的produce,直到FileSourceScanExec,他的doProduce使用的父类InputRDDCodegen的接口,核心就是生成上图中的while循环代码,代码中调用了consume逆向回调上层算子
s"""
| while ($limitNotReachedCond $input.hasNext()) {
| InternalRow $row = (InternalRow) $input.next();
| ${updateNumOutputRowsMetrics}
| ${consume(ctx, outputVars, if (createUnsafeProjection) null else row).trim}
| ${shouldStopCheckCode}
| }
""".stripMargin
consume使用的是父接口CodegenSupport接口,前面说过函数体太长会回退非CodeGen模式,所以这里consume调父接口时会有一个判断,进行函数的拆分截断,整体逻辑还是调用父算子的接口
// Under certain conditions, we can put the logic to consume the rows of this operator into
// another function. So we can prevent a generated function too long to be optimized by JIT.
// The conditions:
// 1. The config "spark.sql.codegen.splitConsumeFuncByOperator" is enabled.
// 2. `inputVars` are all materialized. That is guaranteed to be true if the parent plan uses
// all variables in output (see `requireAllOutput`).
// 3. The number of output variables must less than maximum number of parameters in Java method
// declaration.
val confEnabled = conf.wholeStageSplitConsumeFuncByOperator
val requireAllOutput = output.forall(parent.usedInputs.contains(_))
val paramLength = CodeGenerator.calculateParamLength(output) + (if (row != null) 1 else 0)
val consumeFunc = if (confEnabled && requireAllOutput
&& CodeGenerator.isValidParamLength(paramLength)) {
constructDoConsumeFunction(ctx, inputVars, row)
} else {
parent.doConsume(ctx, inputVars, rowVar)
}
父类接口调用结束生成代码结构
s"""
|${ctx.registerComment(s"CONSUME: ${parent.simpleString(conf.maxToStringFields)}")}
|$evaluated
|$consumeFunc
""".stripMargin
父类FilterExec的doConsume当中构建了自身的逻辑并调用consume接口
// Note: wrap in "do { } while(false);", so the generated checks can jump out with "continue;"
s"""
|do {
| $predicateCode
| $numOutput.add(1);
| ${consume(ctx, resultVars)}
|} while(false);
""".stripMargin
consume还是上面CodegenSupport的逻辑,调用ProjectExec
// Evaluation of non-deterministic expressions can't be deferred.
val nonDeterministicAttrs = projectList.filterNot(_.deterministic).map(_.toAttribute)
s"""
|// common sub-expressions
|${evaluateVariables(localValInputs)}
|$subExprsCode
|${evaluateRequiredVariables(output, resultVars, AttributeSet(nonDeterministicAttrs))}
|${consume(ctx, resultVars)}
""".stripMargin
4.4 编译
前面说过,生成代码以后会先进行编译,编译使用的也是janino
Janino 是一个极小、极快的开源Java 编译器(Janino is a super-small, super-fast Java™ compiler.)。Janino 不仅可以像 JAVAC 一样将 Java 源码文件编译为字节码文件,还可以编译内存中的 Java 表达式、块、类和源码文件,加载字节码并在JVM中直接执行,Janino不作为开发工具,而是运行时的嵌入式编译器
官方的用法示例如下,基本上就是调用Janino的API直接编译字符串代码
public class Main {
public static void main(String[] args) throws CompileException, NumberFormatException, InvocationTargetException {
ScriptEvaluator se = new ScriptEvaluator();
se.cook(
""
+ "static void method1() {\n"
+ " System.out.println(1);\n"
+ "}\n"
+ "\n"
+ "method1();\n"
+ "method2();\n"
+ "\n"
+ "static void method2() {\n"
+ " System.out.println(2);\n"
+ "}\n"
);
se.evaluate();
}
}