NameNode Recovery Tools for the Hadoop Distributed File System

最新推荐文章于 2024-02-04 17:06:21 发布

weixin_30746117

最新推荐文章于 2024-02-04 17:06:21 发布

阅读量679

点赞数

文章标签： java 大数据

原文链接：http://www.cnblogs.com/bramblewalls/p/5612918.html

版权

本文介绍了Hadoop分布式文件系统中NameNode的恢复工具，包括编辑日志故障切换、更强大的文件结束验证、HDFS FSCK等，以减少硬盘故障对NameNode的影响。通过手动NameNode元数据恢复和NameNode恢复模式，可以处理元数据损坏问题，帮助恢复受损的文件系统。

摘要由CSDN通过智能技术生成

转自：http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/

Warning: The procedure described below can cause data loss. Contact Cloudera Support before attempting it.

Most system administrators have had to deal with a bad hard disk at some point. One moment, the hard disk is a mechanical marvel; the next, it is an expensive paperweight.

The HDFS (Hadoop Distributed File System) community has been steadily working to diminish the impact of disk failures on overall system availability. In this article, I’m going to be mostly talking about how to minimize the impact of hard disk failures on the NameNode.

The NameNode’s function is to store metadata. In filesystem jargon, metadata is “data about data”– things like the owners of files, permission bits, and so forth. HDFS stores its metadata on the NameNode in two main places: the FSImage, and the edit log.

Edit Log Failover

It is a good practice to configure your NameNode to store multiple copies of its metadata. By storing two copies of the edit log and FSImage, on two separate hard disks, a good system administrator can avoid bringing down the NameNode if one of those disks fails.

During the NameNode’s startup process, it reads both the FSImage and the edit log. But what if the first place it looks is unreadable, because of a hardware problem or disk corruption? Previously, the NameNode would abort the startup process if it encountered an error while reading an edit log. The administrator would have to remove the corrupt edit log and restart the NameNode. With edit log failover, the NameNode will mark that location as failed automatically, and continue trying the other locations.

More Robust end-of-file Validation

When it’s stored on-disk, the edit log file contains padding at the end. Because we have padding at the end of the file, we can’t simply keep reading the edit log until we get an end-of-file (EOF) condition. Instead, we have to rely on other clues to know where the file ends.

Formerly, the clue we relied on was finding an OP_INVALID opcode. As soon as we read an OP_INVALID opcode, we would immediately assume that there was nothing more to read. However, this is not the most robust way to determine where a file ends. Because an OP_INVALID opcode is a single byte, the likelihood that random corruption could produce an early EOF was unacceptably high.

How can we do better? Well, in most cases, we know what transaction ID an edit log ends on. So we can simply verify that the last edit log operation we read from the file matched this. In cases where we don’t know the end transaction ID, we can verify that the padding at the end of the file contains only padding bytes. This makes the edit log code even more robust.

HDFS FSCK

When your local ext3 or ext4 filesystem has become corrupted, the fsck command can usually repair it. Fsck is an offline process which examines on-disk structures and usually offers to fix them if they are damaged.

HDFS has its own fsck command, which you can access by running “hdfs fsck.” Similar to the ext3 fsck, HDFS fsck determines which files contain corrupt blocks, and gives you options about how to fix them.

However, HDFS fsck only operates on data, not metadata. On a local filesystem, this distinction is irrelevant, because data and metadata are stored in the same place. However, for HDFS, metadata is stored on the NameNode, whereas data is stored on the DataNodes.

Manual NameNode Metadata Recovery

When properly configured, HDFS is much more robust against metadata corruption than a local filesystem, because it stores multiple copies of everything. However, because HDFS is a truly robust system, we added the capability for an administrator to recover a partial or corrupted edit log. This new functionality is called manual NameNode recovery.
Similar to fsck, NameNode recovery is an offline process. An administrator can run NameNode recovery to recover a corrupted edit log. This can be very helpful for getting corrupted filesystems on their feet again.

NameNode Recovery in Action

Let’s test out recovery mode. To activate recovery mode, you start the NameNode with the -recover flag, like so:

./bin/hadoop namenode -recover

1	./bin/hadoop namenode -recover

At this point, the NameNode will ask you whether you want to continue.

You have selected Metadata Recovery mode. This mode is intended to recover lost metadata on a corrupt filesystem. Metadata recovery mode often permanently deletes data from your HDFS filesystem. Please back up your edit log and fsimage before trying this! Are you ready to proceed? (Y/N) (Y or N)

You have selected Metadata Recovery mode. This mode is intended to recover

lost metadata on a corrupt filesystem. Metadata recovery mode often

permanently deletes data from your HDFS filesystem. Please back up your edit

log and fsimage before trying this!

Are you ready to proceed? (Y/N)

(Y or N)

Once you answer yes, the recovery process will read as much of the edit log as possible. When there is an error or an ambiguity, it will ask you how to proceed.

In this example, we encounter an error when trying to read transaction ID 3:

11:10:41,443 ERROR FSImage:147 - Error replaying edit log at offset 71. Expected transaction ID was 3 Recent opcode offsets: 17 71 org.apache.hadoop.fs.ChecksumException: Transaction is corrupt. Calculated checksum is -1642375052 but read checksum -6897 at org.apache.hadoop.hdfs.server.namenode.