HA

http://hadoop.apache.org/docs/r2.0.3-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html


HDFS QJM 重置及启动

sbin/hadoop-daemon.sh start journalnode  两边先启动journalnode

先启动一个sbin/hadoop-daemon.sh start namenode  (nn0)

在第二个上执行同步hdfs namenode -initializeSharedEdits -force

sbin/hadoop-daemon.sh start namenode  (nn1)


首次启动发现都是stanby stanby 

执行# hdfs haadmin -failover --forcefence --forceactive  nn0 nn1

报错:forcefence and forceactive flags not supported with auto-failover enabled.. 原来我配置的

  原来我配了zookeeper和 dfs.ha.automatic-failover.enabled

    <property>
        <name>dfs.ha.automatic-failover.enabled</name> 
        <value> true </value>
    </property>


那就先把zk都起来

然后再执行bin/hdfs zkfc -formatZK,不都起来会报通讯错误

/04 14:26:17 INFO ha.ActiveStandbyElector: Session connected.
14/08/04 14:26:17 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/yh in ZK.

zk配置完成后,可以通过start-dfs.sh来启动namenode了,集群已最先启动自动选举active namenode


又报错

##########################################################################################

14/08/04 14:35:11 INFO common.Storage: Lock on /export/nn/in_use.lock acquired by nodename 9020@Axxxx
14/08/04 14:35:11 INFO impl.MetricsSystemImpl: Stopping NameNode metrics system...
14/08/04 14:35:11 INFO impl.MetricsSystemImpl: NameNode metrics system stopped.
14/08/04 14:35:11 INFO impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
14/08/04 14:35:11 FATAL namenode.NameNode: Exception in namenode join
java.io.FileNotFoundException: No valid image files found
        at org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:144)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:610)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:274)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:728)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:521)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:437)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:613)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:598)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1233)
14/08/04 14:35:11 INFO util.ExitUtil: Exiting with status 1
14/08/04 14:35:11 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at xxxx

java.io.IOException: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /export/nn state: NOT_FORMATTED

hdfs namenode -initializeSharedEdits

hdfs namenode -format


再起还报错

#################################################################################################

14/08/04 14:53:09 INFO http.HttpServer: addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*
14/08/04 14:53:09 INFO http.HttpServer: HttpServer.start() threw a non Bind IOException
java.net.BindException: Port in use: xxxx:50070
        at org.apache.hadoop.http.HttpServer.openListener(HttpServer.java:730)
        at org.apache.hadoop.http.HttpServer.start(HttpServer.java:674)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:173)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:556)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:488)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:451)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:613)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:598)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1233)
Caused by: java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind(Native Method)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
        at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
        at org.apache.hadoop.http.HttpServer.openListener(HttpServer.java:726)
        ... 9 more
        
看看端口咋会占用呢,妹的,原来集群还没配好就启动start-dfs.sh 时datanode连这个namenode的50070端口,去datanode kill掉进程       
[admin@A01-R06-I149-133 hadoop]$ netstat -an|grep 50070
tcp        0      0 ::ffff:xxxx:50070 ::ffff:xxxx:2888  TIME_WAIT   


hdfs namenode -bootstrapStandby 这个命令是要格式化本地nn的,确保自己是stanby
************************************************************/
14/08/04 15:06:36 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
14/08/04 15:06:37 WARN common.Util: Path /export/nn should be specified as a URI in configuration files. Please update hdfs configuration.
14/08/04 15:06:37 WARN common.Util: Path /export/nn should be specified as a URI in configuration files. Please update hdfs configuration.
=====================================================
About to bootstrap Standby ID nn0 from:
           Nameservice ID: yh
        Other Namenode ID: nn1
  Other NN's HTTP address: X.X.X.X:50070
  Other NN's IPC  address: X.X.X.X/X.X.X.X:8020
             Namespace ID: 1095059014
            Block pool ID: BP-446595942-X.X.X.X-1407134712765
               Cluster ID: CID-cc63d698-53f2-4efb-aa29-a55ddb93043d
           Layout version: -40
=====================================================
Re-format filesystem in Storage Directory /export/nn ? (Y or N) Y
14/08/04 15:06:41 INFO namenode.NNStorage: Storage directory /export/nn has been successfully formatted.
14/08/04 15:06:41 WARN common.Util: Path /export/nn should be specified as a URI in configuration files. Please update hdfs configuration.
14/08/04 15:06:41 WARN common.Util: Path /export/nn should be specified as a URI in configuration files. Please update hdfs configuration.
14/08/04 15:06:41 WARN client.QuorumJournalManager: Quorum journal URI 'qjournal://X.X.X.132:8485;X.X.X.X:8485/yh' has an even number of Journal Nodes specified. This is not recommended!
14/08/04 15:06:41 INFO namenode.TransferFsImage: Opening connection to http://X.X.X.X:50070/getimage?getimage=1&txid=0&storageInfo=-40:1095059014:0:CID-cc63d698-53f2-4efb-aa29-a55ddb93043d
14/08/04 15:06:41 INFO namenode.TransferFsImage: Transfer took 0.07s at 0.00 KB/s
14/08/04 15:06:41 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 120 bytes.
14/08/04 15:06:41 INFO util.ExitUtil: Exiting with status 0
14/08/04 15:06:41 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at X.X.X-132.jd.local/X.X.X.132
************************************************************/ 


集群配置好后,切换试试
[admin@A01-R06-I149-132 hadoop-2.0.0-cdh4.5.0]$ hdfs haadmin -failover --forcefence --forceactive  nn0 nn1
forcefence and forceactive flags not supported with auto-failover enabled.
 尼玛不许手动,那咋办捏
 
 kill掉active 在看stanby节点就active了。
 
 在自动模式下如何手工互切呢
故障节点通过hdfs namenode -bootstrapStandby,这个要割掉本地的nn内容的,当然也可以选择N

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值