问题:
麒麟(/ubuntu)上运行hadoop时,造成机器ssh中断能ping通但是不可以ssh
只有重启机器才能解决问题
环境:
os:kylin4.0.2、Ubuntu16.04 (内核4.4.x)
procps 版本: 3.3.10-4kord3k1
----
日志分析:
hadoop ERROR日志大概这样的
ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 15: SIGTERM
2017-10-19 14:02:28,227 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1508392462048_0005_01_000033 Container Transitioned from ALLOCATED to ACQUIRED
2017-10-19 14:02:28,316 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2017-10-19 14:02:28,322 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8088
2017-10-19 14:02:28,323 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2017-10-19 14:02:28,323 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
解析 2 :
执行/bin/kill -15 -1212121 就会造成中断
因为该版本的kill文件有个bug
解决办法:
1.复制一个其他机器 procps版本为 3.3.10-4ubuntu2.2+ 的/bin/kill 文件
并替换当前机器的kill文件
2.更新当前系统的 procps版本为3.3.10-4ubuntu2.2+
参考:
------- 网站资源 1.hadoop 会被kill https://bugs.launchpad.net/ubuntu/+source/procps/+bug/1610499 [^] 2.kill_ procps https://bugs.launchpad.net/ubuntu/+source/procps/+bug/1637026 [^] 3.procps release_note https://launchpad.net/ubuntu/+source/procps/2:3.3.10-4ubuntu2.2 [^] |