Container killed by YARN for exceeding memory limits

SunnyRivers

已于 2023-07-20 11:21:55 修改

阅读量965

点赞数 1

分类专栏：分析别人的Bug让自己“零”Bug 文章标签： spark bug memory limits killed

于 2023-07-14 17:42:43 首次发布

本文链接：https://blog.csdn.net/Android_xue/article/details/131722103

版权

分析别人的Bug让自己“零”Bug 专栏收录该内容

15 篇文章 ¥9.90 ¥99.00

订阅专栏

超级会员免费看

当Spark作业在YARN上运行时，由于内存不足可能导致container被杀。解决办法包括关闭`yarn.nodemanager.vmem-check-enabled`，增大`spark.yarn.executor.memoryOverhead`，降低并行度，处理数据倾斜，以及调整RDD缓存和内存比例。此外，理解YARN相关参数如`yarn.nodemanager.resource.memory-mb`对优化配置至关重要。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Bug信息

WARN TaskSetManager: Lost task 49.2 in stage 6.0 (TID xxx, 
xxx.xxx.xxx.compute.internal): ExecutorLostFailure (executor 16 exited caused by one
 of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 
 18 GB of 18 GB physical memory used. Consider boosting 
 spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled...

Bug本质原因

Yarn的nodemanager中某个container内存不够了，换句话说就是这个container中的数据太大了，超出它的内存上限了。
那么一个container中内存存了什么东西呢？是什么导致的超出内存限制呢？
下面这张图可以直观的看出几个内存的关系：
在这里插入图片描述
从上图可以看出一个container中有两部分内存组成&#x