集群问题

ha resourcemanager无法启动
在无法启动的节点上单独启动

./yarn-daemon.sh start resourcemanager
hadoop-daemon.sh   start|stop        namenode|datanode| journalnode

yarn-daemon.sh         start |stop       resourcemanager|nodemanager

Call From master/192.168.128.135 to master:8485 failed on connection exception: java.net.ConnectException: Connection
journalnode(端口8485)是在namenode后启动的。默认情况下namenode启动10s(maxRetries=10, sleepTime=1000)后journalnode还没有启动,就会报上述错误。

#启动hadfs,注意有的是在多个节点执行的。
hadoop-daemons.sh start journalnode
hadoop-daemon.sh start namenode  #每个namenode都要执行
hadoop-daemon.sh start zkfc  #每个namenode都要执行
hadoop-daemons.sh start datanode
#启动yarn
start-yarn.sh

查看集群状态

hadoop dfsadmin -report

启动jobhistory

sbin/mr-jobhistory-daemon.sh start historyserver

Hadoop JobHistory
https://www.cnblogs.com/luogankun/p/4019303.html

8088端口,active的节点使用standby节点的主机名:端口访问
standby节点使用active 的ip:端口访问

yarn rmadmin -getServiceState rm1

查看状态

zkfc无法启动

hdfs zkfc -formatZK

The auxService:spark_shuffle does not exist
https://github.com/jaceklaskowski/mastering-apache-spark-book/blob/master/yarn/spark-yarn-YarnShuffleService.adoc

java.io.IOException: Unable to create data directory

zookeeper 遇到这种问题,是没有权限

resourcemanager需要手动单独启动:
http://blog.csdn.net/dr_guo/article/details/50975851

OpenJDK 64-Bit Server VM warning: You have loaded library /home/vpe.cripac/softwares/hadoop/hadoop-2.7.3/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.

在环境变量中添加这个,然后source

export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"

hadoop版本升级

使用-upgrade选项启动一台namenode

http://blog.csdn.net/knowledgeaaa/article/details/51330890

java.io.IOException: Filesystem closed

在hdfs core-site.xml里把fs.hdfs.impl.disable.cache设置为true

ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs are bad:

https://blog.csdn.net/kaede1209/article/details/77881799

hadoop nodemanager 启动了,在yarn上看不到,而别的节点没什么问题,这种情况,可能是由于节点内存以及硬盘空间不够导致的,查看内存,并释放内存后,可以解决

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-TDavJg4P-1582619412920)(https://img-blog.csdn.net/20180419101134904?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvSmFja0xpMzE3NDI=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast)]

yarn node -list -all 查看集群nodemanager状态

NodeManager is unhealthy, local-dirs are bad

yarn-site.xml

<property>
    <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
    <value>0.25</value>
  </property>
  <property>
    <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
    <value>95.0</value>
  </property>

libopencv_core.so.2.4 cannot open the object file
https://blog.csdn.net/zhuquan945/article/details/53768465

Hadoop配置文件
fs.checkpoint.dir与dfs.name.dir设置多个路径,是为了冗余备份,而dfs.data.dir设置多个路径是为了负载均衡
https://blog.csdn.net/jediael_lu/article/details/38680013

Directory /hadoop_dirs/dfs/name is in an inconsistent state: storage directory does not exis t or is not accessible

缺/hadoop_dirs/dfs/name目录

集群运行过程中不能删除userlogs,否则会使得集群进入安全模式,而且无法离开

File /node_labels/nodelabel.mirror.writing could only be replicated to 0 nodes instead of minReplication (=1)

关闭防火墙即可

/tmp目录下保存了hadoop,spark,kafka等pid信息,不要随便删除该目录,如果使用了rm -rf *,则使用ps aux |grep java 以及kill -9 将pid杀掉,再启动集群就不会有端口占用的情况

更换集群的名字后,会导致zookeeper找不到,datanode也找不到,所以最好不要换

Unable to start failover controller. Parent znode does not exist

需要重新格式化zk

Initialization failed for Block pool (Datanode Uuid unassigned)

删除data、tmp、namenode 数据后,重新格式化。

解决关闭Hadoop时no namenode to stop异常
https://blog.csdn.net/GYQJN/article/details/50805472

hadoop pid文件及如何修改路径
https://blog.csdn.net/qq_37408712/article/details/80954615

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值