HDFS如何使用多个磁盘

最新推荐文章于 2023-05-08 14:38:44 发布

qiuchenl

最新推荐文章于 2023-05-08 14:38:44 发布

阅读量1.2k

点赞数

分类专栏：分布式文件系统

分布式文件系统专栏收录该内容

15 篇文章 0 订阅

订阅专栏

1 fs.default.name

To run HDFS, you need to designate (指派)one machine as a namenode. In this case, the
property fs.default.name is a HDFS filesystem URI, whose host is the namenode’s
hostname or IP address, and port is the port that the namenode will listen on for RPCs.
If no port is specified, the default of 8020 is used.
The fs.default.name property also doubles as specifying the default filesystem. The
default filesystem is used to resolve relative paths, which are handy （有用）to use since they
save typing (and avoid hardcoding knowledge of a particular namenode’s address). For
example, with the default filesystem defined in Example 9-1, the relative URI /a/b is
resolved to hdfs://namenode/a/b.

2 dfs.name.dir

here are a few other configuration properties you should set for HDFS: those that set
the storage directories for the namenode and for datanodes. The property
dfs.name.dir specifies a list of directories where the namenode stores persistent file-
system metadata (the edit log, and the filesystem image). A copy of each of the metadata
files is stored in each directory for redundancy（冗余,即namenode在 dfs.name.dir 每一项位置中存的数据都是一样的） .

It’s common to configure dfs.name.dir so that the namenode metadata is written to one or two local disks , and
a remote disk , such as a NFS-mounted directory. Such a setup guards against failure
of a local disk, and failure of the entire namenode, since in both cases the files can be
recovered and used to start a new namenode. (The secondary namenode takes only
periodic checkpoints of the namenode, so it does not provide an up-to-date backup of
the namenode.)

3 dfs.data.dir

You should also set the dfs.data.dir property, which specifies a list of directorie s for
a datanode to store its blocks. Unlike the namenode, which uses multiple directories
for redundancy(冗余), a datanode round-robins(轮循, datanode 在 dfs.data.dir 每一项位置中存的数据是不一样的） . ) writes between its storage directories, so for
performance you should specify a storage directory for each local disk. Read perform-
ance also benefits from having multiple disks for storage, because blocks will be spread
across them, and concurrent reads for distinct blocks will be correspondingly spread
across disks.

4 fs.checkpoint.dir

Finally, you should configure where the secondary namenode stores its checkpoints of
the filesystem. The fs.checkpoint.dir property specifies a list of directories where the
checkpoints are kept. Like the storage directories for the namenode, which keep re-
dundant copies of the namenode metadata, the checkpointed filesystem image is stored
in each checkpoint directory for redundancy.

Note that the storage directories for HDFS are under Hadoop’s tempo-
rary directory by default (the hadoop.tmp.dir property, whose default
is /tmp/hadoop-${user.name}). Therefore it is critical that these proper-
ties are set so that data is not lost by the system clearing out temporary
directories.

qiuchenl

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
HDFS如何使用多个磁盘

1 fs.default.nameTo run HDFS, you need to designate (指派)one machine as a namenode. In this case, theproperty fs.default.name is a HDFS filesystem URI, whose host is the namenode’shostname or I
复制链接

扫一扫

专栏目录