大数据集群错误汇总
HDFS错误汇总
启动namenode失败
- stderr: /var/lib/ambari-agent/data/errors-580.txt
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/namenode.py", line 408, in <module>
NameNode().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/namenode.py", line 138, in start
upgrade_suspended=params.upgrade_suspended, env=env)
File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/hdfs_namenode.py", line 199, in namenode
create_log_dir=True
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/utils.py", line 261, in service
Execute(daemon_cmd, not_if=process_id_exists_command, environment=hadoop_env_exports)
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
returns=self.resource.returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/3.1.0.0-78/hadoop/bin/hdfs --config /usr/hdp/3.1.0.0-78/hadoop/conf --daemon start namenode'' returned 1.
- stdout: /var/lib/ambari-agent/data/output-580.txt
2020-04-11 10:50:35,438 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(210)) - Stopping NameNode metrics system...
2020-04-11 10:50:35,438 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(216)) - NameNode metrics system stopped.
2020-04-11 10:50:35,439 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(607)) - NameNode metrics system shutdown complete.
2020-04-11 10:50:35,439 ERROR namenode.NameNode (NameNode.java:main(1715)) - Failed to start namenode.
java.net.BindException: Port in use: master:50070
at org.apache.hadoop.http.HttpServer2.constructBindException(HttpServer2.java:1197)
at org.apache.hadoop.http.HttpServer2.bindForSinglePort(HttpServer2.java:1219)
at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:1278)
at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1133)
at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:177)
at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:869)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:691)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:937)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:910)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
Caused by: java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:351)
at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:319)
at org.apache.hadoop.http.HttpServer2.bindListener(HttpServer2.java:1184)
at org.apache.hadoop.http.HttpServer2.bindForSinglePort(HttpServer2.java:1215)
... 9 more
2020-04-11 10:50:35,440 INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: java.net.BindException: Port in use: master:50070
2020-04-11 10:50:35,443 INFO namenode.NameNode (LogAdapter.java:info(51)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/39.99.193.128
************************************************************/
- 解决方案
首先查看error日志,resource_management.core.exceptions.ExecutionFailed: Execution of ‘ambari-sudo.sh su hdfs -l -s /bin/bash -c ‘ulimit -c unlimited ; /usr/hdp/3.1.0.0-78/hadoop/bin/hdfs --config /usr/hdp/3.1.0.0-78/hadoop/conf --daemon start namenode’’ returned 1.查找解决方法无果。
然后仔细查看output日志(上面放的是不完整版),发现java.net.BindException: Port in use: master:50070,首先排查端口是否被占用了:netstat -lnp|grep 50070,如果被占用了杀掉占用进程就可以了:kill -9 <进程号>,但是没有发现占用端口的进程。于是,查看配置信息:vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
39.**.***.128 master # 公网ip
123.**.***.54 slave1
123.**.***.90 slave2
59.**.***.72 slave3
39.***.***.21 slave4
发现master的ip是公网ip,将它换成内网Ip之后
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
# 39.**.***.128 master # 公网ip
172.**.***.86 master # 内网ip
123.**.***.54 slave1
123.**.***.90 slave2
59.**.***.72 slave3
39.***.***.21 slave4
再次启动namenode,问题解决
NFSGateway启动失败
- stderr
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/nfsgateway.py", line 92, in <module>
NFSGateway().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/nfsgateway.py", line 54, in start
nfsgateway(action="start")
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/hdfs_nfsgateway.py", line 64, in nfsgateway
prepare_rpcbind()
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/hdfs_nfsgateway.py", line 55, in prepare_rpcbind
raise Fail("Failed to start rpcbind or portmap")
resource_management.core.exceptions.Fail: Failed to start rpcbind or portmap
- 解决方案
[root@master ~]# systemctl enable rpcbind
[root@master ~]# systemctl start rpcbind