一、简介
DFSClient Hedged Read是Hadoop-2.4.0引入的一个新特性,如果读取一个数据块的操作比较慢,DFSClient Hedged Read将会开启一个从另一个副本的hedged读操作。我们会选取首先完成的操作,并取消其它操作。这个Hedged读特性将有助于控制异常值,比如由于命中一个坏盘等原因而需要花费较长时间的异常阅读等。
二、开启
DFSClient Hedged Read特性默认是关闭的。如果要开启,则需配置如下:
1、dfs.client.hedged.read.threadpool.size
并发Hedged 读的线程池大小
2、dfs.client.hedged.read.threshold.millis
开启一个Hedged 读前的等待时间(毫秒)
三、实现分析
1、DFSClient实现
DFSClient中,定义了一个静态线程池:
private static ThreadPoolExecutor HEDGED_READ_THREAD_POOL;
DFSClient的构造函数中,有如下处理:
this.hedgedReadThresholdMillis = conf.getLong(
DFSConfigKeys.DFS_DFSCLIENT_HEDGED_READ_THRESHOLD_MILLIS,
DFSConfigKeys.DEFAULT_DFSCLIENT_HEDGED_READ_THRESHOLD_MILLIS);
int numThreads = conf.getInt(
DFSConfigKeys.DFS_DFSCLIENT_HEDGED_READ_THREADPOOL_SIZE,
DFSConfigKeys.DEFAULT_DFSCLIENT_HEDGED_READ_THREADPOOL_SIZE);
if (numThreads > 0) {
this.initThreadsNumForHedgedReads(numThreads);
}
根据参数dfs.client.hedged.read.threadpool.size确定是否实例化线程池,而initThreadsNumForHedgedReads()方法如下:
/**
* Create hedged reads thread pool, HEDGED_READ_THREAD_POOL, if
* it does not already exist.
* @param num Number of threads for hedged reads thread pool.
* If zero, skip hedged reads thread pool creation.
*/
private synchronized void initThreadsNumForHedgedReads(int num) {
if (num <= 0 || HEDGED_READ_THREAD_POOL != null) return;
HEDGED_READ_THREAD_POOL = new ThreadPoolExecutor(1, num, 60,
TimeUnit.SECONDS, new SynchronousQueue<Runnable>(),
new Daemon.DaemonFactory() {
private final AtomicInteger threadIndex =