关于使用hadoop的版本,我都是基于hadoop-0.20.205.0版本,后面将不再对版本进行说明。
以下参数来自一本hadoop书的配置总结,这里进行摘抄。这里不提供具体的操作步骤和原因,如果想知道详解可参考《Pro Hadoop》这本书。
关于Storage Allocations(影响存储分配参数)
参数影响是dfs.balance.bandwidthPerSec
The Balancer is run by the command start-balancer.sh
Reserved Disk Space(影响磁盘空间是否足够)
Hadoop Core provides four parameters:two for HDFS and two for MapReduce.
mapred.local.dir.minspacestart
mapred.local.dir.minspacekill
dfs.datanode.du.reserved
dfs.datanode.du.pct
Server Pending Connections(ipc请求设置连接队列大小)
parameter: ipc.server.listen.queue.size
NameNode Threads
parameter: dfs.namenode.handler.count
Block Service Threads
parameter: dfs.datanode.handler.count
File Descriptors(增大文件句柄数)
change /etc/security/limits.conf file of the following form:
* hard nofile 64000
this changes the per-user file descriptor limit to 64000 file descriptors. if you will run a much lager number of file descriptors,you may nedd to alter the per-system
limits via changes to fs.file-max in /etc/sysctl.conf 如下:
fs.file-max=64000
要使生效,前者在下一次登录就生效;后者需要重启系统。
另外就是使用noatime and nodiratime参数,详细设置参考《在linux下使用noatime提升文件系统性能》改参数修改用于datanode上面。
一般对于Secondary Namenode 元数据的存放最后放在那个独立的磁盘上面。(最安全的方式就是Namenode 和Secondary Namenode分在两台独立的机器上面)
关于相关参数是
fs.checkpoint.dir fs.checkpoint.period fs.checkpoint.size
The secondary NameNode periodically(fs.checkpoint.period) requests a checkpoint from the NameNode.At that point, the NameNode will close the current edit log and start a new edit log.
NameNode Disk I/O tuning
相关参数是dfs.name.dir和dfs.name.edits.dir两个,作用一个操作日志的记录,一个存放元数据。
DataNode Disk I/O Tuning
相关参数是dfs.data.dir、dfs.replication、dfs.block.size
Network I/O Tuning
dfs.datanode.dns.interface dfs.datanode.dns.nameserver
Recovery from Failure(以下都是常用如何从错误中进行恢复)
Namenode磁盘结构推荐采用RAID1的方式,而在datanode建议不采用RAID方式,如果非要采用,使用RAID5还可以接受。
Namenode Recovery
1、Shut down the secondary NameNode.
2、Copy the contents of the Secondary:fs.checkpoint.dir to Namenode: dfs.name.dir
3、Copy the contents of the Secondary:fs.checkpoint.edits.dir to the Namenode: dfs.name.edits.dir.
4、When the copy completes, you may start the Namenode and restart the secondary NameNode.
增加新节点后
要运行start-balancer.sh脚本
节点退出
1、Create a file on the NameNode machine with the hostnames or IP addresses of the DataNodes you wish to decommission, say /tmp/nodes_to_decommission,This file should contain on hostname or IP address per line, with standard Unx line endings.
2、Modify the hadoop-site.xml file by adding, or updating the following block:
<property>
<name>dfs.hosts.exclude</name>
<value>/tmp/nodes_to_decommission</value>
</property>
3、Run following command to start the decommissioning process:
hadoop dfsadmin -refreshNodes
Deleted File Recovery
相关参数fs.trash.interval
Data Loss or Corruption
这个还是针对NameNode的处理(哪怕使用了RAID0和RAID5的话),操作步骤如下:
1、Archive the data if required.
2、Wipe all of the directories listed in dfs.name.dir.
3、Copy the contents of the fs.checkpoint.dir from the secondary NameNode to the fs.checkpoint.dir on the primary NameNode machine.
4、Run the following NameNode command:
hadoop namenode -importCheckpoint