集群上线一阵子了。发现SNN 日志有如下问题:
2013-10-28 16:38:16,280 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://0.0.0.0:50070/getimage?getimage=1
2013-10-28 16:38:16,281 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs cause:java.net.ConnectException: Connection refused
2013-10-28 16:38:16,281 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint:
2013-10-28 16:38:16,281 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at sun.net.NetworkClient.doConnect(Unknown Source)
at sun.net.www.http.HttpClient.openServer(Unknown Source)
at sun.net.www.http.HttpClient.openServer(Unknown Source)
at sun.net.www.http.HttpClient.<init>(Unknown Source)
at sun.net.www.http.HttpClient.New(Unknown Source)
at sun.net.www.http.HttpClient.New(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:172)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$4.run(SecondaryNameNode.java:404)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$4.run(SecondaryNameNode.java:393)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:393)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:494)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:369)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:332)
at java.lang.Thread.run(Unknown Source)
同时发现,SNN的 checkpoint目录为空。
解决办法:
设置hdfs-site.xml
<property>
<name>dfs.http.address</name>
<value>192.168.1.81:50070</value>
</property>
SNN启动后需要向NN请求相关image文件,但未配置导致。
配置完毕后,查看SNN日志,发现:
2013-10-28 17:01:10,777 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint:
2013-10-28 17:01:10,777 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.FileNotFoundException: http://192.168.1.81:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:797783056:0:1374228512000:1374228205935&newChecksum=06ed8d62329e47f1863a0230215c8317
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:172)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(SecondaryNameNode.java:435)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:499)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:369)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:332)
at java.lang.Thread.run(Unknown Source)
因为SNN已经断掉很久,刚起来的时候这个checkpoint位置会有问题。
过一会,日志恢复正常。相应目录也有了editlog等同步文件。
2013-10-28 17:06:10,858 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://192.168.1.81:50070/getimage?getedit=1
2013-10-28 17:06:10,861 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file edits size 60832 bytes.
2013-10-28 17:06:10,861 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 64-bit
2013-10-28 17:06:10,861 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory = 582.5425 MB
2013-10-28 17:06:10,861 INFO org.apache.hadoop.hdfs.util.GSet: capacity = 2^26 = 67108864 entries
2013-10-28 17:06:10,861 INFO org.apache.hadoop.hdfs.util.GSet: recommended=67108864, actual=67108864
2013-10-28 17:06:11,259 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs
2013-10-28 17:06:11,259 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2013-10-28 17:06:11,259 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2013-10-28 17:06:11,259 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.block.invalidate.limit=100
2013-10-28 17:06:11,259 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The dfs.support.append option is in your configuration, however append is not supported. This configuration option is no longer required to enable sync.
2013-10-28 17:06:11,259 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
2013-10-28 17:06:11,260 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
2013-10-28 17:06:11,264 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 7369
2013-10-28 17:06:11,348 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 10
2013-10-28 17:06:11,386 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: EOF of /opt/dfs/snn/current/edits, reached end of edit log Number of transactions found: 321. Bytes read: 60832
2013-10-28 17:06:11,386 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /opt/dfs/snn/current/edits of size 60832 edits # 321 loaded in 0 seconds.
2013-10-28 17:06:11,388 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 0 0 0
2013-10-28 17:06:11,394 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log: position=60832, editlog=/opt/dfs/snn/current/edits
2013-10-28 17:06:11,394 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate to 60832, editlog=/opt/dfs/snn/current/edits
2013-10-28 17:06:11,400 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log: position=60832, editlog=/data1/dfs/snn/current/edits
2013-10-28 17:06:11,400 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate to 60832, editlog=/data1/dfs/snn/current/edits
2013-10-28 17:06:11,406 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log: position=60832, editlog=/data2/dfs/snn/current/edits
2013-10-28 17:06:11,407 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate to 60832, editlog=/data2/dfs/snn/current/edits
2013-10-28 17:06:11,412 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log: position=60832, editlog=/data3/dfs/snn/current/edits
2013-10-28 17:06:11,412 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate to 60832, editlog=/data3/dfs/snn/current/edits
2013-10-28 17:06:11,433 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 1162298 saved in 0 seconds.
2013-10-28 17:06:11,444 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log: position=4, editlog=/opt/dfs/snn/current/edits
2013-10-28 17:06:11,445 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate to 4, editlog=/opt/dfs/snn/current/edits
还需要在NN里补上这个配置,这样双方就呼应起来了。
<property>
<name>dfs.secondary.http.address</name>
<value>secondarynamenode:50090</value>
</property>