一、问题背景描述
1.任务提交异常日志
2023-06-29 15:48:20,877 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:21,129 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:21,381 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:21,633 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:21,885 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:22,137 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:22,389 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:22,641 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2023-06-29 15:48:22,894 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
2.问题描述
集群剩余资源:集群可用为180cpu、可用内存为228GB,当前只剩余196G内存、120G内存。
还剩余32G内存、60cpu,却无法继续提交任务,异常日志上图所示。
二、处理过程
1.描述
默认情况,Fair队列资源使用策略, 不能使用全部队列资源,有个公式可以计算。以下是我粗暴的解决问题赶进度了。 后续有时间在细细研究了! 其他更多详细解释可以参考这位博主的文章 yarn队列之fair队列 、YARN三种资源调度器解析
2.操作
可通过在cdh yarn的配置输入框 搜索 “MaxAMShare” 关键词,结果如下
fair 配置文件格式化后如下:
{
"defaultFairSharePreemptionThreshold":null,
"defaultFairSharePreemptionTimeout":null,
"defaultMinSharePreemptionTimeout":null,
"defaultQueueSchedulingPolicy":"fair",
"queueMaxAMShareDefault":1,
"queueMaxAppsDefault":null,
"queuePlacementRules":[
{
"create":true,
"name":"specified",
"queue":null,
"rules":null
},
{
"create":true,
"name":"nestedUserQueue",
"queue":null,
"rules":[
{
"create":true,
"name":"default",
"queue":"users",
"rules":null
}
]
},
{
"create":null,
"name":"default",
"queue":null,
"rules":null
}
],
"queues":[
{
"aclAdministerApps":"*",
"aclSubmitApps":"*",
"allowPreemptionFrom":null,
"fairSharePreemptionThreshold":null,
"fairSharePreemptionTimeout":null,
"minSharePreemptionTimeout":null,
"name":"root",
"queues":[
{
"aclAdministerApps":null,
"aclSubmitApps":null,
"allowPreemptionFrom":null,
"fairSharePreemptionThreshold":null,
"fairSharePreemptionTimeout":null,
"minSharePreemptionTimeout":null,
"name":"users",
"queues":[
{
"aclAdministerApps":null,
"aclSubmitApps":null,
"allowPreemptionFrom":null,
"fairSharePreemptionThreshold":null,
"fairSharePreemptionTimeout":null,
"minSharePreemptionTimeout":null,
"name":"admin",
"queues":[
],
"schedulablePropertiesList":[
{
"impalaClampMemLimitQueryOption":null,
"impalaDefaultQueryMemLimit":null,
"impalaDefaultQueryOptions":null,
"impalaMaxMemory":null,
"impalaMaxQueryMemLimit":null,
"impalaMaxQueuedQueries":null,
"impalaMaxRunningQueries":null,
"impalaMinQueryMemLimit":null,
"impalaQueueTimeout":null,
"maxAMShare":1,
"maxChildResources":null,
"maxResources":null,
"maxRunningApps":null,
"minResources":null,
"scheduleName":"default",
"weight":100
}
],
"schedulingPolicy":"drf",
"type":null
}
],
"schedulablePropertiesList":[
{
"impalaClampMemLimitQueryOption":null,
"impalaDefaultQueryMemLimit":null,
"impalaDefaultQueryOptions":null,
"impalaMaxMemory":null,
"impalaMaxQueryMemLimit":null,
"impalaMaxQueuedQueries":null,
"impalaMaxRunningQueries":null,
"impalaMinQueryMemLimit":null,
"impalaQueueTimeout":null,
"maxAMShare":1,
"maxChildResources":null,
"maxResources":null,
"maxRunningApps":null,
"minResources":null,
"scheduleName":"default",
"weight":1
}
],
"schedulingPolicy":"drf",
"type":"parent"
}
],
"schedulablePropertiesList":[
{
"impalaClampMemLimitQueryOption":null,
"impalaDefaultQueryMemLimit":null,
"impalaDefaultQueryOptions":null,
"impalaMaxMemory":null,
"impalaMaxQueryMemLimit":null,
"impalaMaxQueuedQueries":null,
"impalaMaxRunningQueries":null,
"impalaMinQueryMemLimit":null,
"impalaQueueTimeout":null,
"maxAMShare":1,
"maxChildResources":null,
"maxResources":null,
"maxRunningApps":null,
"minResources":null,
"scheduleName":"default",
"weight":1
}
],
"schedulingPolicy":"drf",
"type":null
}
],
"userMaxAppsDefault":null,
"users":[
]
}
关键配置修改:我主要是对 "queues":[]中相关maxAMShare 参数的修改为1,表示可以全部使用分配给队列的资源。以上是我的修改。修改后,保存即可!
3.修改后,重新继续提交资源,正常提交所有资源