持续更新ing…
Heap oom
Spark程序本质上是一个分布式的jvm应用, 因此当内存设置不合理,内存有泄露,使用不当或者内存管理不够好的时候容易出现java heap oom
Driver heap oom
在Driver的日志中发现有OutOfMemoryError相关的日志,说明是driver oom导致作业失败,常见错误有
java.lang.OutOfMemoryError: Java heap space at
java.lang.OutOfMemoryError: GC overhead limit exceeded
如果Driver日志中没有OutOfMemoryError相关日志,但是看到了如下日志,也很有可能是driver heap oom
ERROR: org.apache.spark.deploy.yarn.ApplicationMaster: RECEIVED SIGNAL TERM
Executor heap oom
在失败的exector页面查看到如下报错中的某一种,可考虑是否数据倾斜引起的oom
java.lang.OutOfMemoryError
ExectorLostFailure(executor 98 exited caused by one of the running tasks)Reason:Container marked as failed:container_xxx on host: xxx. Exit status:143.Diagnostics:Container killed on request. Exit code is 143
ExecutorLostFailure(executor 87 exited caused by one of the running tasks) Reason:Executor heartbeat timed out after 66666ms.
堆外内存不足
主要失败原因是堆外内存不足
参数添加 : spark.yarn.executor.memoryOverhead=2048
ExecutorLostFailure (executor 90 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 6.52 GB of 6.50 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.