某次cloudera-scm-agnet系统启动不了
前几天添加磁盘的时候系统以外重启了,那个时候并没stop cloudera-scm-server和cloudera-scm-agent两个进程,导致了重新启动的时候cloudera-scm-server可以启动而cloudera-scm-agent无法启动。
使用命令: systemctl status cloudera-scm-agent得到的反馈
● cloudera-scm-agent.service - LSB: Cloudera SCM Agent
Loaded: loaded (/etc/rc.d/init.d/cloudera-scm-agent; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2019-08-22 22:52:37 CST; 10h ago
Docs: man:systemd-sysv-generator(8)
Process: 16911 ExecStop=/etc/rc.d/init.d/cloudera-scm-agent stop (code=exited, status=0/SUCCESS)
Process: 18991 ExecStart=/etc/rc.d/init.d/cloudera-scm-agent start (code=exited, status=1/FAILURE)
Aug 22 22:52:37 test cloudera-scm-agent[18991]: install: cannot create directory ‘/var/run’: File exists
Aug 22 22:52:37 test su[19009]: (to root) root on none
Aug 22 22:52:37 test su[19009]: pam_systemd(su:session): Failed to connect to system bus: No such file or directory
Aug 22 22:52:37 test su[19009]: pam_unix(su:session): session opened for user root by (uid=0)
Aug 22 22:52:37 test su[19009]: pam_unix(su:session): session closed for user root
Aug 22 22:52:37 test cloudera-scm-agent[18991]: Starting cloudera-scm-agent: [FAILED]
Aug 22 22:52:37 test systemd[1]: cloudera-scm-agent.service: control process exited, code=exited status=1
Aug 22 22:52:37 test systemd[1]: Failed to start LSB: Cloudera SCM Agent.
Aug 22 22:52:37 test systemd[1]: Unit cloudera-scm-agent.service entered failed state.
Aug 22 22:52:37 test systemd[1]: cloudera-scm-agent.service failed.
查看日志
一开始看着这一堆问题的时候完全不知道发生了什么事情,决定去看日志
使用命令:tail -1000f /var/log/cloudera-scm-agent/cloudera-scm-agent.log
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.0-py2.7.egg/cmf/monitor/firehose.py", line 116, in _send
self._port)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 469, in __init__
self.conn.connect()
File "/usr/lib64/python2.7/httplib.py", line 824, in connect
self.timeout, self.source_address)
File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
tm还是没有看出些什么来,然后又看另外一个日志
使用命令:tail -100f /var/log/cloudera-scm-agent/cloudera-scm-agent.out
[root@test cloudera-scm-agent]# tail -10000f cloudera-scm-agent.out
[13/Aug/2019 13:09:49 +0000] 16976 MainThread agent INFO SCM Agent Version: 5.14.0
[13/Aug/2019 13:09:49 +0000] 16976 MainThread agent WARNING Expected mode 0751 for /run/cloudera-scm-agent but was 0755
[13/Aug/2019 13:09:49 +0000] 16976 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent
[13/Aug/2019 14:24:07 +0000] 32268 MainThread agent INFO SCM Agent Version: 5.14.0
[13/Aug/2019 14:24:07 +0000] 32268 MainThread agent WARNING Expected mode 0751 for /run/cloudera-scm-agent but was 0755
[13/Aug/2019 14:24:07 +0000] 32268 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent
[22/Aug/2019 13:51:57 +0000] 1475 MainThread agent INFO SCM Agent Version: 5.14.0
[22/Aug/2019 13:51:57 +0000] 1475 MainThread agent WARNING Expected mode 0751 for /run/cloudera-scm-agent but was 0755
[22/Aug/2019 13:51:57 +0000] 1475 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent
[22/Aug/2019 22:35:59 +0000] 17962 MainThread agent INFO SCM Agent Version: 5.14.0
[22/Aug/2019 22:35:59 +0000] 17962 MainThread agent WARNING Expected mode 0751 for /run/cloudera-scm-agent but was 0755
[22/Aug/2019 22:35:59 +0000] 17962 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent
Unable to create the pidfile.
[22/Aug/2019 22:52:37 +0000] 19011 MainThread agent INFO SCM Agent Version: 5.14.0
[22/Aug/2019 22:52:37 +0000] 19011 MainThread agent WARNING Expected mode 0751 for /run/cloudera-scm-agent but was 0755
[22/Aug/2019 22:52:37 +0000] 19011 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent
Unable to create the pidfile.
[23/Aug/2019 09:13:53 +0000] 23991 MainThread agent INFO SCM Agent Version: 5.14.0
[23/Aug/2019 09:13:53 +0000] 23991 MainThread agent WARNING Expected mode 0751 for /var/run/cloudera-scm-agent but was 0755
[23/Aug/2019 09:13:53 +0000] 23991 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent
现在有一些线索了,这里有这么一句Unable to create the pidfile,然后去百度了一下大部分的博客都是说在/run目录下没有cloudera-scm-agent这个目录,只要创建一下就有了,然而我这里却是有的。
完了,现在线索好像又断掉了。
事情还是不要想的太复杂
再次执行systemctl status cloudera-scm-agent这个命令,却在这里找到一个新线索
Process: 18991 ExecStart=/etc/rc.d/init.d/cloudera-scm-agent start (code=exited, status=1/FAILURE)
Aug 22 22:52:37 test cloudera-scm-agent[18991]: install: cannot create directory ‘/var/run’: File exists
Aug 22 22:52:37 test su[19009]: (to root) root on none
Aug 22 22:52:37 test su[19009]: pam_systemd(su:session): Failed to connect to system bus: No such file or directory
这里说cannot create directory ‘/var/run’: File exists,那我就去/var下看了看,这个地方有一个软连接由:
/var/run -> /run的,然后又百度了一下/run这个目录是干嘛的,就一个临时目录,重启后会删掉的。
好吧,这时果断把软连接删除掉。然后cloudera-scm-agent竟然启动成功了,看来问题就是出在这里。
事情还没结束
进入到cm后,在启动HDFS时又出问题了,namenode启动不了。查看日志上显示:
Failed to start namenode.
java.io.IOException: NameNode is not formatted.
好吧,我手动到集群里面格式化一下namenode吧
接着,我启动Hive和Spark的过程中都发生错误了,显示/user目录的读写权限变了,我需要重写变更,接着就是Spark的History Server启动不了,查看日志是:
File does not exist: hdfs://xxx:8020/user/spark/spark2ApplicationHistory
那么没有就手动创建一下吧,终于所有事情都解决掉了。
哎…心累