HDFS EC Reconstruction

hdfs ec重构块的代码设计巧妙,本文总结其设计思想。
下文先分析DN重构块的流程。
DN收到NN下达的命令后,判断如果是 BlockECReconstructionCommand 命令,DN则开始重构工作。

// BPOfferService.java
case DatanodeProtocol.DNA_ERASURE_CODING_RECONSTRUCTION:
  LOG.info("DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY");
  Collection<BlockECReconstructionInfo> ecTasks =
      ((BlockECReconstructionCommand) cmd).getECTasks(); //1
  dn.getErasureCodingWorker().processErasureCodingTasks(ecTasks); //2
  break;

我们看一下重构的对象 Collection ecTasks 的结构,包括块组id和源及目标DN信息等:

public static class BlockECReconstructionInfo {
 private final ExtendedBlock block;
 private final DatanodeInfo[] sources;
 private DatanodeInfo[] targets;
 private String[] targetStorageIDs;
 private StorageType[] targetStorageTypes;
 private final byte[] liveBlockIndices;
 private final ErasureCodingPolicy ecPolicy;
 ...

dn.getErasureCodingWorker()获得 private ErasureCodingWorker ecWorker; 那么,ecWorker是一个什么角色呢?我们看定义:

/**
 * ErasureCodingWorker handles the erasure coding reconstruction work commands.
 * These commands would be issued from Namenode as part of Datanode's heart beat
 * response. BPOfferService delegates the work to this class for handling EC
 * commands.
 */
public final class ErasureCodingWorker {
  private static final Logger LOG = DataNode.LOG;
 
  private final DataNode datanode;
  private final Configuration conf;
  private final float xmitWeight;
 
  private ThreadPoolExecutor stripedReconstructionPool;
  private ThreadPoolExecutor stripedReadPool;

ErasureCodingWorker 用于处理EC的重构命令,该命令是NN给DN的心跳回复。

ECWorker在整个架构中的角色如下图所示,用于服务DN有关块构建恢复工作。
在这里插入图片描述
在这里插入图片描述
线程池篇

我们看一下ecWorker中的两个线程池:

  • private ThreadPoolExecutor stripedReconstructionPool;
  • private ThreadPoolExecutor stripedReadPool;

线程池的初始化

  // ErasureCodingWorker#initializeStripedReadThreadPool
  private void initializeStripedReadThreadPool() {
    LOG.debug("Using striped reads");

    // Essentially, this is a cachedThreadPool.
    stripedReadPool = new ThreadPoolExecutor(0, Integer.MAX_VALUE,
        60, TimeUnit.SECONDS,
        new SynchronousQueue<>(),
        new Daemon.DaemonFactory() {
          private final AtomicInteger threadIndex = new AtomicInteger(0);
          @Override
          public Thread newThread(Runnable r) {
            Thread t = super.newThread(r);
            t.setName("stripedRead-" + threadIndex.getAndIncrement());
            return t;
          }
        },
        new ThreadPoolExecutor.CallerRunsPolicy() {
          @Override
          public void rejectedExecution(Runnable runnable,
                                        ThreadPoolExecutor e) {
            LOG.info("Execution for striped reading rejected, "
                + "Executing in current thread");
            // will run in the current thread
            super.rejectedExecution(runnable, e);
          }
        });

    stripedReadPool.allowCoreThreadTimeOut(true);
  }

队列使用了无界的new SynchronousQueue<>();使用线程工厂来命名;使用默认的自定义拒绝策略(仅仅打印log,该策略也会执行线程)。核心线程在allowCoreThreadTimeout被设置为true时会超时退出,默认情况下不会退出。当线程空闲时间达到keepAliveTime(上述为60s),该线程会退出,直到线程数量等于corePoolSize。如果allowCoreThreadTimeout设置为true,则所有线程均会退出直到线程数量为0。

第二个线程池初始化

  // ErasureCodingWorker#initializeStripedBlkReconstructionThreadPool
  private void initializeStripedBlkReconstructionThreadPool(int numThreads) {
    LOG.debug("Using striped block reconstruction; pool threads={}", numThreads);
    stripedReconstructionPool = DFSUtilClient.getThreadPoolExecutor(2,
        numThreads, 60, new LinkedBlockingQueue<>(),
        "StripedBlockReconstruction-", false);
    stripedReconstructionPool.allowCoreThreadTimeOut(true);
  }
  
  //上述封装了getThreadPool方法,放在Util中,如下:
  // DFSUtilClient#getThreadPoolExecutor
  public static ThreadPoolExecutor getThreadPoolExecutor(
      int corePoolSize,
      int maxPoolSize, 
      long keepAliveTimeSecs, 
      BlockingQueue<Runnable> queue,
      String threadNamePrefix, 
      boolean runRejectedExec) {
    Preconditions.checkArgument(corePoolSize > 0);
    ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(corePoolSize,
        maxPoolSize, keepAliveTimeSecs, TimeUnit.SECONDS,
        queue, new Daemon.DaemonFactory() {
          private final AtomicInteger threadIndex = new AtomicInteger(0);

          @Override
          public Thread newThread(Runnable r) {
            Thread t = super.newThread(r);
            t.setName(threadNamePrefix + threadIndex.getAndIncrement());
            return t;
          }
        });
    if (runRejectedExec) {
      threadPoolExecutor.setRejectedExecutionHandler(new ThreadPoolExecutor
          .CallerRunsPolicy() {
        @Override
        public void rejectedExecution(Runnable runnable,
            ThreadPoolExecutor e) {
          LOG.info(threadNamePrefix + " task is rejected by " +
                  "ThreadPoolExecutor. Executing it in current thread.");
          // will run in the current thread
          super.rejectedExecution(runnable, e);
        }
      });
    }
    return threadPoolExecutor;
  }  

封装了get线程池方法,总体还是一样。拒绝策略传入false,不设策略,及即使用默认策略:

/**
 * The default rejected execution handler
 */
 private static final RejectedExecutionHandler defaultHandler = new AbortPolicy();
 

线程池初始化后,看下线程池提交线程的流程:

  public void processErasureCodingTasks(
      Collection<BlockECReconstructionInfo> ecTasks) {
    for (BlockECReconstructionInfo reconInfo : ecTasks) {
      int xmitsSubmitted = 0;
      try {
        StripedReconstructionInfo stripedReconInfo =
            new StripedReconstructionInfo(
            reconInfo.getExtendedBlock(), reconInfo.getErasureCodingPolicy(),
            reconInfo.getLiveBlockIndices(), reconInfo.getSourceDnInfos(),
            reconInfo.getTargetDnInfos(), reconInfo.getTargetStorageTypes(),
            reconInfo.getTargetStorageIDs());
        // It may throw IllegalArgumentException from task#stripedReader
        // constructor.
        final StripedBlockReconstructor task =
            new StripedBlockReconstructor(this, stripedReconInfo);
        if (task.hasValidTargets()) {
          // See HDFS-12044. We increase xmitsInProgress even the task is only
          // enqueued, so that
          //   1) NN will not send more tasks than what DN can execute and
          //   2) DN will not throw away reconstruction tasks, and instead keeps
          //      an unbounded number of tasks in the executor's task queue.
          xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1);
          getDatanode().incrementXmitsInProcess(xmitsSubmitted);
          stripedReconstructionPool.submit(task);
        } else {
          LOG.warn("No missing internal block. Skip reconstruction for task:{}",
              reconInfo);
        }
      } catch (Throwable e) {
        getDatanode().decrementXmitsInProgress(xmitsSubmitted);
        LOG.warn("Failed to reconstruct striped block {}",
            reconInfo.getExtendedBlock().getLocalBlock(), e);
      }
    }
  }

在这里插入图片描述
提交到线程池后,接下来就交给了StripedBlockReconstructor implements Runnablerun()

  //StripedBlockReconstructor#run
  public void run() {
    try {
      initDecoderIfNecessary();

      getStripedReader().init();

      stripedWriter.init();

      reconstruct();

      stripedWriter.endTargetBlocks();

      // Currently we don't check the acks for packets, this is similar as
      // block replication.
    } catch (Throwable e) {
      LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
      getDatanode().getMetrics().incrECFailedReconstructionTasks();
    } finally {
      getDatanode().decrementXmitsInProgress(getXmits());
      final DataNodeMetrics metrics = getDatanode().getMetrics();
      metrics.incrECReconstructionTasks();
      metrics.incrECReconstructionBytesRead(getBytesRead());
      metrics.incrECReconstructionRemoteBytesRead(getRemoteBytesRead());
      metrics.incrECReconstructionBytesWritten(getBytesWritten());
      getStripedReader().close();
      stripedWriter.close();
      cleanup();
    }
  }
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值