hadoop配置文件 -- hdfs-site.xml(updating)

dfs.datanode.du.reserved

dfs.datanode.du.reserved表示的是datanode在对磁盘写数据的时候,保留多少空间给非HDFS使用。这个参数主要是为了防止磁盘空间被写满导致的HDFS异常。

最近遇到一个HDFS磁盘空间写满的问题,原因也很简单,就是因为dfs.datanode.du.reserved这个参数没有考虑到操作系统保留块空间的问题。

比如:

一块4T的盘,格式化完了以后剩下3.6T,去掉5%的系统保留空间,就剩下3.4T。

但是hbase是按照磁盘的总空间3.6T来算写入的,此时可以写入的实际只有3.4T。这里总空间和实际空间的差值就是0.2T,也就是200G。

而此时如果dfs.datanode.du.reserved设置为小于200G,那么hbase就存在空间写满的风险。

因此,我们在配置HBase的时候,需要根据系统实际使用空间来配置dfs.datanode.du.reserved参数的大小。

解决办法:

1、将系统保留空间调整小(总空间-系统保留空间

tune2fs  -m 1 /dev/diskname

    备注:系统默认保留5%的空间,也就是tune2fs  -m 5 /dev/diskname。上面的命令即保留为1%

2、将dfs.datanode.du.reserved调大

最佳实践:

因为HDFS一般使用的数据盘都是T级别的硬盘,系统默认保留5%其实是很浪费的。因此最好的解决办法是同时进行1、2两种方法的调整。

推荐系统保留空间为2%,并将dfs.datanode.du.reserved做调整。

比盘是4T,格式化后可用空间为3.6T,系统保留空间为75G左右,dfs.datanode.du.reserved设置为200G

dfs.replication

We have 3 settings for hadoop replication namely:

dfs.replication.max = 10
dfs.replication.min = 1
dfs.replication     = 2

So dfs.replication is default replication of files in hadoop cluster until a hadoop client is setting it manually using "setrep". and a hadoop client can set max replication up to dfs.replication.mx.

dfs.relication.min is used in two cases:

During safe mode, it checks whether replication of blocks is upto dfs.replication.min or not.
dfs.replication.min are processed synchronously. and remaining dfs.replication-dfs.replication.min are processed asynchronously.

So we have to set these configuration on each node (namenode+datanode) or only on client node?

What if setting for above three settings vary on different datanodes?

Replication factor can’t be set for any specific node in cluster, you can set it for entire cluster/directory/file. dfs.replication can be updated in running cluster in hdfs-sie.xml.

Set the replication factor for a file- hadoop dfs -setrep -w <rep-number> file-path
Or set it recursively for directory or for entire cluster- hadoop fs -setrep -R -w 1 /
Use of min and max rep factor-

While writing the data to datanode it is possible that many datanodes may fail. If the dfs.namenode.replication.min replicas written then the write operation succeed. Post to write operation the blocks replicated asynchronously until it reaches to dfs.replication level.

The max replication factor dfs.replication.max is used to set the replication limit of blocks. A user can’t set the blocks replication more than the limit while creating the file.

You can set the high replication factor for blocks of popular file to distribute the read load on the cluster.

So do i have to set configuration parameters on (namenode+datanode) or only on namenode or on all datanodes? We are using hive for data loading activites, so how can we set high replication factor for blocks of popular file to distribute the read load on the cluster.? – user2950086 May 22 '14 at 12:26 
default blocks' replication configuration you can update in hdfs-site.xml, that is one time cluster level configuration. if you have to update the replication for hive table then find the table location in HDFS(default /user/hive/warehouse..) and change the replication by command- hadoop fs -setrep -R -w rep-num /user/hive/warehouse/xyz_table – Rahul Sharma May 22 '14 at 13:04
do we have some configuration parameter for hive, which can make replication for that table configurable? e.g hive -e "some query" -hiveconf some.configuration.parameter – user2950086 May 23 '14 at 9:48
i haven't come across any such configuration at HiveConf. all hadoop command can be run from Hive CLI. e.g. hive> dfs -setrep -w file-path.... or $hive -e "dfs -setrep -w file-path" – Rahul Sharma May 23 '14 at 12:56
Isn't the dfs.replication.min used ONLY during safe mode? – Marsellus Wallace May 31 '15 at 21:03
@Gevorg yes, it is also used during the safe mode as well once every file has minimum replication HDFS exists the safe mode. – Rahul Sharma Oct 14 '17 at 0:12
@user2950086 you can set replication for a directory and create all managed and external tables inside that directory to achieve desired replication. –

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

大怀特

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值