如何修改Namenode侦测错误Datanode的时间

A question was asked of me recently, on how to force the namenode to detect failed datanodes more quickly.

This is very useful when testing, and can be quite risky on busy clusters.
In hadoop 19, the parameter  heartbeat.recheck.interval  is the primary control.
This value, in msec, is just under 1/2 of the timeout period for a Datanode.

In hadoop 0.19, in FSNamesystem.java:

long heartbeatInterval = conf.getLong("dfs.heartbeat.interval", 3) * 1000; / 3 seconds
this.heartbeatRecheckInterval = conf.getInt( "heartbeat.recheck.interval", 5 * 60 * 1000); // 5 minutes
this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval + 10 * heartbeatInterval;

To change the timeout at your client/application level, the parameter
this.socketTimeout = conf.getInt(" dfs.socket.timeout ",
HdfsConstants.READ_TIMEOUT);
this.datanodeWriteTimeout = conf.getInt(" dfs.datanode.socket.write.timeout ",
HdfsConstants.WRITE_TIMEOUT);

dfs.socket.timeout  controls the base timeout for read/connect operations against a datanode.

The constants are:
// Timeouts for communicating with DataNode for streaming writes/reads
public static int READ_TIMEOUT = 60 * 1000;
public static int WRITE_TIMEOUT = 8 * 60 * 1000;
public static int WRITE_TIMEOUT_EXTENSION = 5 * 1000; //for write pipeline



The write timeout for an individual dfs write operation is defined as
long writeTimeout = HdfsConstants.WRITE_TIMEOUT_EXTENSION * nodes.length +
datanodeWriteTimeout;
or 40 minutes times the replication factor.

The socket timeout for a dfs operation is defined as:
int timeoutValue = 3000 * nodes.length + socketTimeout;
which is roughly the replication factor times 3 minutes.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值