一,-Initial job has not accepted any resources;
提交到yarn上的job显示running状态,但是executor的日志持续打印如下告警:
WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
表面上看是集群资源不足,但从yarn资源管理界面看cup和内存非常宽裕,单个结点的资源也是足够的,显然不是资源不足的问题。且提交spark任务到yarn集群上后,申请资源成功,Driver执行正常,但无法分发任务到Executor。
各种排查,均无果。
无计可施时,想到因为无法读s3,同事更换了spark-jar目录下的Hadoop相关jar包,将Hadoop3.2升级到Hadoop3.3.1,猜想是否是这个问题,于是将jar包回滚,一切正常。至此,定位到具体问题了。
但jar包不能回滚,必须升级到3.3.1版本,之前只升级了Hadoop-common、Hadoop-core相关包,考虑把yarn相关包也涉及到3.3.1,付诸实践,运行正常,证明猜测准确(尴尬,不知道根本原因)。
总结:
1,出现本文告警,一方面可能是集群资源真的不足,测试环境可以通过杀死其他不重要应用获得资源,生产环境要考虑资源扩容。
2,这次问题并非资源不足,实际资源充足,且申请资源成功,只是任务下发时失败,多次试验后发现是jar包版本不一致;
3,没弄懂根本机制,不知具体什么细节导致如上报错;
二,Application report for application_1639448093344_0002 (state: ACCEPTED)
原因是资源不够,无法申请任务所需资源:
21/12/14 10:31:28 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:29 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:30 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:31 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:32 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:33 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:34 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:35 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:36 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:37 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:38 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:39 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:40 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:41 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:42 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:43 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:44 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:45 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:46 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:47 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:48 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:49 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:50 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:51 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:52 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:53 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:54 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:55 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:56 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:57 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
21/12/14 10:31:58 INFO yarn.Client: Application report for application_1639448093344_0002 (state: ACCEPTED)
三个原因:
- 一是集群资源不够;
- 二是yarn-site配置的资源不够;
- 三是集群机器防火墙未关;