HDFS如何使用多个磁盘

fs.default.name

To run HDFS, you need to designate (指派)one machine as a namenode. In this case, the
property fs.default.name is a HDFS filesystem URI, whose host is the namenode’s
hostname or IP address, and port is the port that the namenode will listen on for RPCs.
 
If no port is specified, the default of 8020 is used.
The fs.default.name property also doubles as specifying the default filesystem. The
default filesystem is used to resolve relative paths, which are handy (有用)to use since they
save typing (and avoid hardcoding knowledge of a particular namenode’s address). For
example, with the default filesystem defined in Example 9-1, the relative URI /a/b is
resolved to hdfs://namenode/a/b.

2 dfs.name.dir


  here are a few other configuration properties you should set for HDFS: those that set
the storage directories for the namenode and for datanodes. The property
dfs.name.dir specifies a list of directories where the namenode stores persistent file-
system metadata (the edit log, and the filesystem image). A copy of each of the metadata
files is stored in each directory for redundancy( 冗余,即namenode在 dfs.name.dir 每一 项位置中存的数据都是一样的  .

It’s common to configure dfs.name.dir so that the namenode metadata is written to one or two local disks , and
a remote disk , such as a NFS-mounted directory. Such a setup guards against failure
of a local disk, and failure of the entire namenode, since in both cases the files can be
recovered and used to start a new namenode. (The secondary namenode takes only
periodic checkpoints of the namenode, so it does not provide an up-to-date backup of
the namenode.)


3 dfs.data.dir


You should also set the dfs.data.dir property, which specifies a list of directorie s for
a datanode to store its blocks. Unlike the namenode, which uses multiple directories
for redundancy(冗余), a datanode round-robins(轮循, datanode 在 dfs.data.dir 每一 项位置中存的数据是不一样的  . ) writes between its storage directories, so for
performance you should specify a storage directory for each local disk. Read perform-
ance also benefits from having multiple disks for storage, because blocks will be spread
across them, and concurrent reads for distinct blocks will be correspondingly spread
across disks.

4 fs.checkpoint.dir

Finally, you should configure where the secondary namenode stores its checkpoints of
the filesystem. The fs.checkpoint.dir property specifies a list of directories where the
checkpoints are kept. Like the storage directories for the namenode, which keep re-
dundant copies of the namenode metadata, the checkpointed filesystem image is stored
in each checkpoint directory for redundancy.

Note that the storage directories for HDFS are under Hadoop’s tempo-
rary directory by default (the hadoop.tmp.dir property, whose default
is /tmp/hadoop-${user.name}). 
Therefore it is critical that these proper-
ties are set so that data is not lost by the system clearing out temporary
directories.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值