http://hadoop.apache.org/docs/r2.0.3-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
HDFS QJM 重置及启动
sbin/hadoop-daemon.sh start journalnode 两边先启动journalnode
先启动一个sbin/hadoop-daemon.sh start namenode (nn0)
在第二个上执行同步hdfs namenode -initializeSharedEdits -force
sbin/hadoop-daemon.sh start namenode (nn1)
首次启动发现都是stanby stanby
执行# hdfs haadmin -failover --forcefence --forceactive nn0 nn1
报错:forcefence and forceactive flags not supported with auto-failover enabled.. 原来我配置的
原来我配了zookeeper和 dfs.ha.automatic-failover.enabled
</property>
然后再执行bin/hdfs zkfc -formatZK,不都起来会报通讯错误
/04 14:26:17 INFO ha.ActiveStandbyElector: Session connected.
14/08/04 14:26:17 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/yh in ZK.
zk配置完成后,可以通过start-dfs.sh来启动namenode了,集群已最先启动自动选举active namenode
又报错
##########################################################################################
14/08/04 14:35:11 INFO common.Storage: Lock on /export/nn/in_use.lock acquired by nodename 9020@Axxxx
14/08/04 14:35:11 INFO impl.MetricsSystemImpl: Stopping NameNode metrics system...
14/08/04 14:35:11 INFO impl.MetricsSystemImpl: NameNode metrics system stopped.
14/08/04 14:35:11 INFO impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
14/08/04 14:35:11 FATAL namenode.NameNode: Exception in namenode join
java.io.FileNotFoundException: No valid image files found
at org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:144)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:610)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:274)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:728)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:521)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:437)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:613)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:598)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1233)
14/08/04 14:35:11 INFO util.ExitUtil: Exiting with status 1
14/08/04 14:35:11 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at xxxx
java.io.IOException: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /export/nn state: NOT_FORMATTED
hdfs namenode -initializeSharedEdits
hdfs namenode -format
再起还报错
#################################################################################################
14/08/04 14:53:09 INFO http.HttpServer: addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*
14/08/04 14:53:09 INFO http.HttpServer: HttpServer.start() threw a non Bind IOException
java.net.BindException: Port in use: xxxx:50070
at org.apache.hadoop.http.HttpServer.openListener(HttpServer.java:730)
at org.apache.hadoop.http.HttpServer.start(HttpServer.java:674)
at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:173)
at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:556)
at org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:488)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:451)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:613)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:598)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1233)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at org.apache.hadoop.http.HttpServer.openListener(HttpServer.java:726)
... 9 more
看看端口咋会占用呢,妹的,原来集群还没配好就启动start-dfs.sh 时datanode连这个namenode的50070端口,去datanode kill掉进程
[admin@A01-R06-I149-133 hadoop]$ netstat -an|grep 50070
tcp 0 0 ::ffff:xxxx:50070 ::ffff:xxxx:2888 TIME_WAIT
hdfs namenode -bootstrapStandby 这个命令是要格式化本地nn的,确保自己是stanby
************************************************************/
14/08/04 15:06:36 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
14/08/04 15:06:37 WARN common.Util: Path /export/nn should be specified as a URI in configuration files. Please update hdfs configuration.
14/08/04 15:06:37 WARN common.Util: Path /export/nn should be specified as a URI in configuration files. Please update hdfs configuration.
=====================================================
About to bootstrap Standby ID nn0 from:
Nameservice ID: yh
Other Namenode ID: nn1
Other NN's HTTP address: X.X.X.X:50070
Other NN's IPC address: X.X.X.X/X.X.X.X:8020
Namespace ID: 1095059014
Block pool ID: BP-446595942-X.X.X.X-1407134712765
Cluster ID: CID-cc63d698-53f2-4efb-aa29-a55ddb93043d
Layout version: -40
=====================================================
Re-format filesystem in Storage Directory /export/nn ? (Y or N) Y
14/08/04 15:06:41 INFO namenode.NNStorage: Storage directory /export/nn has been successfully formatted.
14/08/04 15:06:41 WARN common.Util: Path /export/nn should be specified as a URI in configuration files. Please update hdfs configuration.
14/08/04 15:06:41 WARN common.Util: Path /export/nn should be specified as a URI in configuration files. Please update hdfs configuration.
14/08/04 15:06:41 WARN client.QuorumJournalManager: Quorum journal URI 'qjournal://X.X.X.132:8485;X.X.X.X:8485/yh' has an even number of Journal Nodes specified. This is not recommended!
14/08/04 15:06:41 INFO namenode.TransferFsImage: Opening connection to http://X.X.X.X:50070/getimage?getimage=1&txid=0&storageInfo=-40:1095059014:0:CID-cc63d698-53f2-4efb-aa29-a55ddb93043d
14/08/04 15:06:41 INFO namenode.TransferFsImage: Transfer took 0.07s at 0.00 KB/s
14/08/04 15:06:41 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 120 bytes.
14/08/04 15:06:41 INFO util.ExitUtil: Exiting with status 0
14/08/04 15:06:41 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at X.X.X-132.jd.local/X.X.X.132
************************************************************/
集群配置好后,切换试试
[admin@A01-R06-I149-132 hadoop-2.0.0-cdh4.5.0]$ hdfs haadmin -failover --forcefence --forceactive nn0 nn1
forcefence and forceactive flags not supported with auto-failover enabled.
尼玛不许手动,那咋办捏
kill掉active 在看stanby节点就active了。
在自动模式下如何手工互切呢
故障节点通过hdfs namenode -bootstrapStandby,这个要割掉本地的nn内容的,当然也可以选择N