Spark Shuffle Tracking 原理分析(1)

最新推荐文章于 2024-05-18 15:23:51 发布

2401_84183545

最新推荐文章于 2024-05-18 15:23:51 发布

阅读量533

点赞数 24

分类专栏：程序员文章标签： spark 大数据分布式

本文链接：https://blog.csdn.net/2401_84183545/article/details/138264108

版权

程序员专栏收录该内容

124 篇文章 0 订阅

订阅专栏

ExecutorMonitor 为每个 Executor 创建一个 Tracker, 用于跟踪此 Executor 的状态。

private val executors = new ConcurrentHashMap[String, Tracker]()

定时任务间隔时间查找 timeout 的 executor，然后处理。

timedOutExecutors 方法的主要逻辑，就是遍历 executors。如果 executor 没有 active 的 shuffle 并且当前时间大于 executor 的超时时间 timeoutAt，则此 executor 可以被安全释放。

为什么 executor 有 active shuffle 数据就不能 kill?
在这里插入图片描述

Shuffle 的过程：

MapTask 把 shuffle 写到本地，并且把状态汇报给 Driver.
Reduce Task 从 Driver 获取 shuffle status，并从 shuffle status 获取每个 shuffle 数据的地址。
连接对应的 executor 获取 shuffle 数据。

如果在 reduce 获取完 shuffle status 后，MapTask 所在的 Executor 被 kill 掉，Reduce Task 就无法获取 shuffle 数据。

如果执行 decommission 逻辑，把 MapTask 的 shuffle 数据长传到 bos 等分布式存储是否可以？

也是不可以的，因为 reduce 可能已经把 shuffle status 拿走，获取的 shuffle status 没有记录 shuffle 数据在分布式存储上。

参考： ExecutorMonitor，ExecutorAllocationManager

Executor 状态的更新

ExecutorMonitor 实现了 SparkListner 接口，当 Job, Stage, Task 等 start 和 end 时，都会执行回调。

以 hasActiveShuffle 为例
每个 executor 用一个集合 shuffleIds 存储其上拥有的 shuffle 数据。当其为空时，说明没有 shuffle 数据。

在 onTaskEnd 和 onBlockUpdated 时调用 addShuffle 向 shuffleIds 添加数据。

在以下时机删除 shuffleIds 里的数据。

依赖 driver 端的 ContextCleaner，当 ShuffleRDD 仅有 weakReference 时触发。
rdd.cleanShuffleDependencies 方法，但是此方法仅在 org.apache.spark.ml.recommendation.ALS 使用。

timeoutAt 的计算逻辑

总结：timeoutAt 根据 idle 的时间，spark.dynamicAllocation.cachedExecutorIdleTimeout 和 spark.dynamicAllocation.shuffleTracking.timeout 这 3 个值中最大的值。

详细计算逻辑：
timeoutAt 在一些事件发生时触发计算，如 onBlockUpdated, onUnpersistRDD, updateRunningTasks, removeShuffle, updateActiveShuffles
timeoutAt 的计算逻辑：
当执行器有计算任务时为 Long.MaxValue。
否则为 max(_cacheTimeout, _shuffleTimeout, idleTimeoutNs)
_cacheTimeout: 如果没有 cache 数据，为0，否则为参数 spark.dynamicAllocation.cachedExecutorIdleTimeout 的值（默认 Long.MaxValue）。

_shuffleTimeout: 如果没有 shuffle数据，为 0, 否则为参数 spark.dynamicAllocation.shuffleTracking.timeout 的值（默认 Long.MaxValue）。
idleTimeoutNs 为 spark.dynamicAllocation.executorIdleTimeout

3. 测试

测试命令

spark-shell  \
 --conf spark.dynamicAllocation.enabled=true \
 --conf spark.dynamicAllocation.initialExecutors=2 \
 --conf spark.dynamicAllocation.maxExecutor=400 \


![img](https://img-blog.csdnimg.cn/img_convert/52999feb7dbd4e21e8e06ee6b73cb2ad.png)
![img](https://img-blog.csdnimg.cn/img_convert/b754dd5e1dbcdabc6204729584a23015.png)
![img](https://img-blog.csdnimg.cn/img_convert/6beaf8f74c3ead1301e0f55194a60cd6.png)

**既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，涵盖了95%以上大数据知识点，真正体系化！**

**由于文件比较多，这里只是将部分目录截图出来，全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频，并且后续会持续更新**

**[需要这份系统化资料的朋友，可以戳这里获取](https://bbs.csdn.net/topics/618545628)**

码讲义、实战项目、大纲路线、讲解视频，并且后续会持续更新**

**[需要这份系统化资料的朋友，可以戳这里获取](https://bbs.csdn.net/topics/618545628)**

2401_84183545

关注

24
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
Spark Shuffle Tracking 原理分析(1)

ExecutorMonitor 为每个 Executor 创建一个 Tracker, 用于跟踪此 Executor 的状态。定时任务间隔时间查找 timeout 的 executor，然后处理。timedOutExecutors 方法的主要逻辑，就是遍历 executors。如果 executor 没有 active 的 shuffle 并且当前时间大于 executor 的超时时间 timeoutAt，则此 executor 可以被安全释放。
复制链接

扫一扫