hadoop相关参数总结(在生产环境下)

最新推荐文章于 2020-01-07 01:12:48 发布

fish_cool

最新推荐文章于 2020-01-07 01:12:48 发布

阅读量3.6k

点赞数

文章标签： hadoop file descriptor raid5 command parameters

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/chenyi8888/article/details/7473239

版权

关于使用hadoop的版本，我都是基于hadoop-0.20.205.0版本，后面将不再对版本进行说明。

以下参数来自一本hadoop书的配置总结，这里进行摘抄。这里不提供具体的操作步骤和原因，如果想知道详解可参考《Pro Hadoop》这本书。

关于Storage Allocations（影响存储分配参数）

参数影响是dfs.balance.bandwidthPerSec

The Balancer is run by the command start-balancer.sh

Reserved Disk Space（影响磁盘空间是否足够）

Hadoop Core provides four parameters:two for HDFS and two for MapReduce.

mapred.local.dir.minspacestart

mapred.local.dir.minspacekill

dfs.datanode.du.reserved

dfs.datanode.du.pct

Server Pending Connections（ipc请求设置连接队列大小）

parameter: ipc.server.listen.queue.size

NameNode Threads

parameter: dfs.namenode.handler.count

Block Service Threads

parameter: dfs.datanode.handler.count

File Descriptors（增大文件句柄数）

change /etc/security/limits.conf file of the following form:

* hard nofile 64000

this changes the per-user file descriptor limit to 64000 file descriptors. if you will run a much lager number of file descriptors,you may nedd to alter the per-system

limits via changes to fs.file-max in /etc/sysctl.conf 如下：

fs.file-max=64000

要使生效，前者在下一次登录就生效；后者需要重启系统。

另外就是使用noatime and nodiratime参数，详细设置参考《在linux下使用noatime提升文件系统性能》改参数修改用于datanode上面。

一般对于Secondary Namenode 元数据的存放最后放在那个独立的磁盘上面。（最安全的方式就是Namenode 和Secondary Namenode分在两台独立的机器上面）

关于相关参数是

fs.checkpoint.dir fs.checkpoint.period fs.checkpoint.size

The secondary NameNode periodically(fs.checkpoint.period) requests a checkpoint from the NameNode.At that point, the NameNode will close the current edit log and start a new edit log.

NameNode Disk I/O tuning

相关参数是dfs.name.dir和dfs.name.edits.dir两个，作用一个操作日志的记录，一个存放元数据。

DataNode Disk I/O Tuning

相关参数是dfs.data.dir、dfs.replication、dfs.block.size

Network I/O Tuning

dfs.datanode.dns.interface dfs.datanode.dns.nameserver

Recovery from Failure（以下都是常用如何从错误中进行恢复）

Namenode磁盘结构推荐采用RAID1的方式，而在datanode建议不采用RAID方式，如果非要采用，使用RAID5还可以接受。

Namenode Recovery

1、Shut down the secondary NameNode.

2、Copy the contents of the Secondary:fs.checkpoint.dir to Namenode: dfs.name.dir

3、Copy the contents of the Secondary:fs.checkpoint.edits.dir to the Namenode: dfs.name.edits.dir.

4、When the copy completes, you may start the Namenode and restart the secondary NameNode.

增加新节点后

要运行start-balancer.sh脚本

节点退出

1、Create a file on the NameNode machine with the hostnames or IP addresses of the DataNodes you wish to decommission, say /tmp/nodes_to_decommission,This file should contain on hostname or IP address per line, with standard Unx line endings.

2、Modify the hadoop-site.xml file by adding, or updating the following block:

<property>

<name>dfs.hosts.exclude</name>

<value>/tmp/nodes_to_decommission</value>

</property>

3、Run following command to start the decommissioning process:

hadoop dfsadmin -refreshNodes

Deleted File Recovery

相关参数fs.trash.interval

Data Loss or Corruption

这个还是针对NameNode的处理（哪怕使用了RAID0和RAID5的话），操作步骤如下：

1、Archive the data if required.

2、Wipe all of the directories listed in dfs.name.dir.

3、Copy the contents of the fs.checkpoint.dir from the secondary NameNode to the fs.checkpoint.dir on the primary NameNode machine.

4、Run the following NameNode command:

hadoop namenode -importCheckpoint

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。