hadoop SNN故障解决

集群上线一阵子了。发现SNN 日志有如下问题:

2013-10-28 16:38:16,280 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://0.0.0.0:50070/getimage?getimage=1
2013-10-28 16:38:16,281 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs cause:java.net.ConnectException: Connection refused
2013-10-28 16:38:16,281 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint: 
2013-10-28 16:38:16,281 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.net.SocksSocketImpl.connect(Unknown Source)
        at java.net.Socket.connect(Unknown Source)
        at java.net.Socket.connect(Unknown Source)
        at sun.net.NetworkClient.doConnect(Unknown Source)
        at sun.net.www.http.HttpClient.openServer(Unknown Source)
        at sun.net.www.http.HttpClient.openServer(Unknown Source)
        at sun.net.www.http.HttpClient.<init>(Unknown Source)
        at sun.net.www.http.HttpClient.New(Unknown Source)
        at sun.net.www.http.HttpClient.New(Unknown Source)
        at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(Unknown Source)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)
        at sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
        at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:172)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$4.run(SecondaryNameNode.java:404)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$4.run(SecondaryNameNode.java:393)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Unknown Source)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:393)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:494)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:369)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:332)
        at java.lang.Thread.run(Unknown Source)


同时发现,SNN的 checkpoint目录为空。

解决办法:

设置hdfs-site.xml

  <property>
    <name>dfs.http.address</name>
    <value>192.168.1.81:50070</value>
  </property>

SNN启动后需要向NN请求相关image文件,但未配置导致。

配置完毕后,查看SNN日志,发现:

2013-10-28 17:01:10,777 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint: 
2013-10-28 17:01:10,777 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.FileNotFoundException: http://192.168.1.81:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:797783056:0:1374228512000:1374228205935&newChecksum=06ed8d62329e47f1863a0230215c8317
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
        at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:172)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(SecondaryNameNode.java:435)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:499)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:369)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:332)
        at java.lang.Thread.run(Unknown Source)


因为SNN已经断掉很久,刚起来的时候这个checkpoint位置会有问题。

过一会,日志恢复正常。相应目录也有了editlog等同步文件。

2013-10-28 17:06:10,858 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://192.168.1.81:50070/getimage?getedit=1
2013-10-28 17:06:10,861 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file edits size 60832 bytes.
2013-10-28 17:06:10,861 INFO org.apache.hadoop.hdfs.util.GSet: VM type       = 64-bit
2013-10-28 17:06:10,861 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory = 582.5425 MB
2013-10-28 17:06:10,861 INFO org.apache.hadoop.hdfs.util.GSet: capacity      = 2^26 = 67108864 entries
2013-10-28 17:06:10,861 INFO org.apache.hadoop.hdfs.util.GSet: recommended=67108864, actual=67108864
2013-10-28 17:06:11,259 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs
2013-10-28 17:06:11,259 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2013-10-28 17:06:11,259 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2013-10-28 17:06:11,259 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.block.invalidate.limit=100
2013-10-28 17:06:11,259 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The dfs.support.append option is in your configuration, however append is not supported. This configuration option is no longer required to enable sync.
2013-10-28 17:06:11,259 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
2013-10-28 17:06:11,260 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times 
2013-10-28 17:06:11,264 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 7369
2013-10-28 17:06:11,348 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 10
2013-10-28 17:06:11,386 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: EOF of /opt/dfs/snn/current/edits, reached end of edit log Number of transactions found: 321.  Bytes read: 60832
2013-10-28 17:06:11,386 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /opt/dfs/snn/current/edits of size 60832 edits # 321 loaded in 0 seconds.
2013-10-28 17:06:11,388 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 0 0 0 
2013-10-28 17:06:11,394 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log: position=60832, editlog=/opt/dfs/snn/current/edits
2013-10-28 17:06:11,394 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate to 60832, editlog=/opt/dfs/snn/current/edits
2013-10-28 17:06:11,400 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log: position=60832, editlog=/data1/dfs/snn/current/edits
2013-10-28 17:06:11,400 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate to 60832, editlog=/data1/dfs/snn/current/edits
2013-10-28 17:06:11,406 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log: position=60832, editlog=/data2/dfs/snn/current/edits
2013-10-28 17:06:11,407 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate to 60832, editlog=/data2/dfs/snn/current/edits
2013-10-28 17:06:11,412 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log: position=60832, editlog=/data3/dfs/snn/current/edits
2013-10-28 17:06:11,412 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate to 60832, editlog=/data3/dfs/snn/current/edits
2013-10-28 17:06:11,433 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 1162298 saved in 0 seconds.
2013-10-28 17:06:11,444 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log: position=4, editlog=/opt/dfs/snn/current/edits
2013-10-28 17:06:11,445 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate to 4, editlog=/opt/dfs/snn/current/edits
 
还需要在NN里补上这个配置,这样双方就呼应起来了。
  <property>
             <name>dfs.secondary.http.address</name>
             <value>secondarynamenode:50090</value>
     </property>

 
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值