Spark 之 WholeStageCodegen

zhixingheyi_tian

已于 2025-04-07 17:33:41 修改

阅读量2.8k

点赞数

分类专栏： spark 文章标签：大数据

于 2022-06-25 13:46:20 首次发布

本文链接：https://blog.csdn.net/zhixingheyi_tian/article/details/125458561

版权

spark 专栏收录该内容

148 篇文章 ¥19.90 ¥99.00

订阅专栏

超级会员免费看

CodeGen framework

CodegenSupport(接口)
相邻Operator通过Produce-Consume模式生成代码。
Produce生成整体处理的框架代码，例如aggregation生成的代码框架如下:

if (!initialized) {
 # create a hash map, then build the aggregation hash map
 # call child.produce()
 initialized = true;
}
while (hashmap.hasNext()) {
 row = hashmap.next();
 # build the aggregation results
 # create variables for results
 # call consume(), which will call parent.doConsume()
 if (shouldStop()) return;
}

Consume生成当前节点处理上游输入的Row的逻辑。如Filter生成代码如下:

# code to evaluate the predicate expression, result is isNull1 and value2
if (!isNull1 && value2) {
 # call consume(), which will call parent.doConsume()
}

了解本专栏