Hadoop HA重做 Standby

最新推荐文章于 2022-09-17 22:55:38 发布

weixin_33912445

最新推荐文章于 2022-09-17 22:55:38 发布

阅读量259

点赞数

文章标签：大数据运维 java

原文链接：https://my.oschina.net/cwalet/blog/680572

版权

为什么80%的码农都做不了架构师？>>>

错误现象，刚开始 namenode log一直刷以下错误信息：

2014-01-27 17:55:59,388 WARN resources.ExceptionHandler (ExceptionHandler.java:toResponse(92)) - INTERNAL_SERVER_ERROR

后面与此文类似，见 Hadoop运维笔记之 Namenode异常停止后无法正常启动。

同系 Hadoop-2.10-beta 版本的 bug（testNamenodeRestart fails with NullPointerException in trunk），

This is actually due to a bug in the NN. The http services are started before the image is loaded, the edits are processed, and the rpc server is started. During image loading and edits processing, webhdfs will NPE on the rpc server.

无发启动，只好重做 Standby，具体步骤如下：

1、首先在 Active 上执行以下命令，然后手动备份整个 name目录：

# 关闭 故障自动切换控制器
hadoop-daemon.sh stop zkfc

# 进入安全模式
hdfs dfsadmin -safemode enter

# 刷新editslog 到fsimage
hdfs dfsadmin -saveNamespace

2、然后在 Standby 上，先备份整个 name 及 journal 目录，再执行：

hadoop-daemon.sh stop zkfc
hdfs namenode -bootstrapStandby

若报错：

FATAL ha.BootstrapStandby: Unable to read transaction ids 10-100 from the configured shared edits storage qjournal://1.1.1.1:8485;1.1.1.2:8485/sec-hdfs-cluster. Please copy these logs into the shared edits storage or call saveNamespace on the active node.
Error: Gap in transactions. Expected to be able to read up until at least txid 10 but unable to find any edit logs containing txid 10

则将 Active 上整个 name目录复制到 Standby，然后直接启动namenode即可：

scp -r /data/hadoop/name/ $standby_ip:/data/hadoop
hadoop-daemon.sh start namenode

3、注意，此时无需执行 “bootstrapStandby”，否则会将刚刚复制过来的 name 目录重建清空。

参考：

转载于:https://my.oschina.net/cwalet/blog/680572

weixin_33912445

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫