原因:手动重启过RM,或者因为其他原因导致RM后,Ambari未监控到
报错信息
stderr: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/resourcemanager.py", line 261, in Resourcemanager().execute() File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute method(env) File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 996, in restart raise Fail("Stop command finished but process keep running.") resource_management.core.exceptions.Fail: Stop command finished but process keep running. stdout:
进入ResourceManager记录进程号的目录
/tmp/hsperfdata_yarn
此时目录中肯定是多个进程号文件,导致RM启动时不知道找哪个启动
故障解除:删掉失效的进程号文件,或者删除所有RM进程号文件,重启RM服务