hadoop yarn 使用_flink on yarn 模式下提示yarn资源不足问题分析

最新推荐文章于 2022-09-20 23:00:00 发布

高冷張

最新推荐文章于 2022-09-20 23:00:00 发布

阅读量3.2k

点赞数 2

文章标签： hadoop yarn 使用

本文链接：https://blog.csdn.net/weixin_29208327/article/details/113537459

版权

背景

在实时计算平台上通过YarnClient向yarn上提交flink任务时一直卡在那里，并在client端一直输出如下日志：

(YarnClusterDescriptor.java:1036)- Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster

看到这个的第一反应是yarn上的资源分配问题，于是来到yarn控制台，发现Cluster Metrics中Apps Pending项为1。what?新提交的job为什么会处于pending状态了？

1. 先确定cpu和内存情况如下：

可以看出cpu和内存资源充足，没有发现问题。

2. 查看调度器的使用情况

集群中使用的调度器的类型如下图：

可以看到，集群中使用的是Capacity Scheduler调度器，也就是所谓的容量调度，这种方案更适合多租户安全地共享大型集群，以便在分配的容量限制下及时分配资源。采用队列的概念，任务提交到队列，队列可以设置资源的占比，并且支持层级队列、访问控制、用户限制、预定等等配置。但是，对于资源的分配占比调优需要更多的经验处理。但它不会出现在使用FIFO Scheduler时会出现的有大任务独占资源，会导致其他任务一直处于 pending 状态的问题。

3. 查看任务队列的情况

从上图中可以看出Configured Minimum User Limit Percent的配置为100%，由于集群目前相对较小，用户队列没有做租户划分，用的都是default队列，从图中可以看出使用的容量也只有38.2%，队列中最多可存放10000个application，而实际的远远少于10000，貎似这里也看不出来什么问题。

4. 查看resourceManager的日志

日志内容如下：

2020-11-26 19:33:46,669 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 3172020-11-26 19:33:48,874 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 317 submitted by user root2020-11-26 19:33:48,874 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1593338489799_03172020-11-26 19:33:48,874 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root    IP=x.x.x.x    OPERATION=Submit Application Request    TARGET=ClientRMService    RESULT=SUCCESS    APPID=application_1593338489799_03172020-11-26 19:33:48,874 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1593338489799_0317 State change from NEW to NEW_SAVING on event=START2020-11-26 19:33:48,875 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1593338489799_03172020-11-26 19:33:48,875 INFO org.apache.hadoop.yarn.server.resou

最低0.47元/天解锁文章

高冷張

关注

2
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
hadoop yarn 使用_flink on yarn 模式下提示yarn资源不足问题分析

背景在实时计算平台上通过YarnClient向yarn上提交flink任务时一直卡在那里，并在client端一直输出如下日志：(YarnClusterDescriptor.java:1036)- Deployment took more than 60 seconds. Please check if the requested resources are available in th...
复制链接

扫一扫