Spark调优学习记录（十四）AQE

最新推荐文章于 2025-04-23 10:50:17 发布

Ale_77

最新推荐文章于 2025-04-23 10:50:17 发布

阅读量761

点赞数 4

分类专栏： Spark调优学习记录文章标签： spark 学习大数据

本文链接：https://blog.csdn.net/yuliangwan_jiangyan/article/details/137779689

版权

Spark调优学习记录专栏收录该内容

14 篇文章

订阅专栏

自动合并分区

spark.sql.adaptive.enabled                                # 开启aqe
spark.sql.adaptive.coalescePartitions.enabled             # 自动合并分区
spark.sql.adaptive.coalescePartitions.initialPartitionNum # 初始的分区数。默认为spark.sql.shuffle.partitions的值
spark.sql.adaptive.coalescePartitions.minPartitionNum     # 最小的分区数。默认为spark.sql.shuffle.partitions的值，parquet、orc、json文件起效
spark.sql.adaptive.advisoryPartitionSizeInBytes           # 每个分区建议大小（单位字节）
spark.sql.adaptive.shuffle.targetPostShuffleInputSize     # 同spark.sql.adaptive.advisoryPartitionSizeInBytes

动态申请资源

尽可能更多地申请资源，所以当资源不太多时，适当减小executorAllocationRatio，控制申请maxExecutors。

spark.sql.adaptive.enabled                       # 开启aqe
spark.dynamicAllocation.enabled                  # 默认false
spark.dynamicAllocation.shuffleTracking.enabled	 # shuffle动态跟踪，默认true
spark.dynamicAllocation.initialExecutors	     # 初始化申请资源
spark.dynamicAllocation.maxExecutors	         # 最大资源
spark.dynamicAllocation.minExecutors	         # 最小资源
spark.dynamicAllocation.executorAllocationRatio	 # 提供最大并行度，默认为1

动态切换Join策略

spark.sql.adaptive.enabled                      # 开启aqe
spark.sql.adaptive.localShuffleReader.enabled	# sortJoin转hashJoin

动态调整数据倾斜

spark.sql.adaptive.enabled                                  # 开启aqe
spark.sql.adaptive.skewJoin.enabled                         # 开启倾斜join检测
spark.sql.adaptive.skewJoin.skewedPartitionFactor           # 默认5，当某个分区大小大于中位数5倍，才打散数据
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes # 默认256M，且要数据要大于这个值，才会打散
spark.sql.adaptive.advisoryPartitionSizeInBytes             # 告诉spark拆分后分区数据大小

当合并和倾斜同时使用时，会先合并，再调整倾斜。

动态分区裁剪DPP

spark.sql.optimizer.dynamicPartitionPruning.enabled 

# join条件中要有分区字段
# 另一张表 至少存在一个过滤条件
# join必须是inner join、left join、right join