Job hold原因排查

公司最近搭建了一套hadoop大数据测试环境,使用的都是默认参数,在提交hive任务的时候老是hold,针对这种现象在yarn WebUI界面查看日志;页面如图:
在这里插入图片描述
在这里插入图片描述
日志内容如下:
2018-09-14 10:00:06,939 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1536888000030_0006_m_000000_0 : 13562
2018-09-14 10:00:06,941 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1536888000030_0006_m_000000_0] using containerId: [container_e18_1536888000030_0006_01_000002 on NM: [172.16.1.46:5006]
2018-09-14 10:00:06,943 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1536888000030_0006_m_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
2018-09-14 10:00:06,943 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1536888000030_0006_m_000000 Task Transitioned from SCHEDULED to RUNNING
2018-09-14 10:00:07,808 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1536888000030_0006: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:13924, vCores:7> knownNMs=3
2018-09-14 10:00:08,109 INFO [Socket Reader #1 for port 52795] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1536888000030_0006 (auth:SIMPLE)
2018-09-14 10:00:08,124 INFO [IPC Server handler 0 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1536888000030_0006_m_19791209299970 asked for a task
2018-09-14 10:00:08,124 INFO [IPC Server handler 0 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1536888000030_0006_m_19791209299970 given task: attempt_1536888000030_0006_m_000000_0
2018-09-14 10:00:14,745 INFO [IPC Server handler 1 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:00:44,776 INFO [IPC Server handler 7 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:01:14,805 INFO [IPC Server handler 20 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:01:44,834 INFO [IPC Server handler 28 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:02:14,864 INFO [IPC Server handler 2 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:02:44,899 INFO [IPC Server handler 17 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:03:14,927 INFO [IPC Server handler 21 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:03:44,958 INFO [IPC Server handler 6 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:04:14,982 INFO [IPC Server handler 20 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:04:45,012 INFO [IPC Server handler 21 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:05:12,035 INFO [IPC Server handler 3 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:05:42,062 INFO [IPC Server handler 14 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:06:12,087 INFO [IPC Server handler 26 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:06:42,111 INFO [IPC Server handler 3 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:07:12,135 INFO [IPC Server handler 14 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:07:42,160 INFO [IPC Server handler 26 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:08:12,198 INFO [IPC Server handler 3 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0
2018-09-14 10:08:42,225 INFO [IPC Server handler 14 on 52795] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1536888000030_0006_m_000000_0 is : 1.0

外加yarn资源调度默认用的是capacity-scheduler,而参数yarn.scheduler.capacity.maximum-am-resource-percent 默认值为0.1; AM最大资源只能使用总内存的10% ,因此在yarn Web UI界面 Scheduler 显示有很多资源可以用,界面如图:
在这里插入图片描述

解决方法:
如果默认是公平调度修改fair-scheduler.xml配置 maxAMShare 默认是0.5

1.0

如果是容器调度修改capacity-scheduler.xml配置 默认值0.1
yarn.scheduler.capacity.maximum-am-resource-percent
0.8

同时更新Matser HA 二台机器 --RM读取的 只要读取master HA配置即可

对队列的任何修改都需要执行 yarn rmadmin -refreshQueues – 任务节点熟悉都可以;
做完以上操作后,Hue workflow界面重新提交Job,运行如下:
在这里插入图片描述
查看对应操作日志–hive,报错如下:
在这里插入图片描述
排查是参数问题!!!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值