HDFS EC Reconstruction

最新推荐文章于 2024-07-10 22:50:36 发布

王小禾

最新推荐文章于 2024-07-10 22:50:36 发布

阅读量614

点赞数

分类专栏： HDFS

本文链接：https://blog.csdn.net/answer100answer/article/details/113138700

版权

HDFS 专栏收录该内容

38 篇文章 7 订阅

订阅专栏

hdfs ec重构块的代码设计巧妙，本文总结其设计思想。
下文先分析DN重构块的流程。
DN收到NN下达的命令后，判断如果是 BlockECReconstructionCommand 命令，DN则开始重构工作。

// BPOfferService.java
case DatanodeProtocol.DNA_ERASURE_CODING_RECONSTRUCTION:
  LOG.info("DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY");
  Collection<BlockECReconstructionInfo> ecTasks =
      ((BlockECReconstructionCommand) cmd).getECTasks(); //1
  dn.getErasureCodingWorker().processErasureCodingTasks(ecTasks); //2
  break;

我们看一下重构的对象 Collection ecTasks 的结构，包括块组id和源及目标DN信息等：

public static class BlockECReconstructionInfo {
 private final ExtendedBlock block;
 private final DatanodeInfo[] sources;
 private DatanodeInfo[] targets;
 private String[] targetStorageIDs;
 private StorageType[] targetStorageTypes;
 private final byte[] liveBlockIndices;
 private final ErasureCodingPolicy ecPolicy;
 ...

dn.getErasureCodingWorker()获得 private ErasureCodingWorker ecWorker; 那么，ecWorker是一个什么角色呢？我们看定义：

/**
 * ErasureCodingWorker handles the erasure coding reconstruction work commands.
 * These commands would be issued from Namenode as part of Datanode's heart beat
 * response. BPOfferService delegates the work to this class for handling EC
 * commands.
 */
public final class ErasureCodingWorker {
  private static final Logger LOG = DataNode.LOG;
 
  private final DataNode datanode;
  private final Configuration conf;
  private final float xmitWeight;
 
  private ThreadPoolExecutor stripedReconstructionPool;
  private ThreadPoolExecutor stripedReadPool;

ErasureCodingWorker 用于处理EC的重构命令，该命令是NN给DN的心跳回复。

ECWorker在整个架构中的角色如下图所示，用于服务DN有关块构建恢复工作。
在这里插入图片描述

线程池篇

我们看一下ecWorker中的两个线程池：

private ThreadPoolExecutor stripedReconstructionPool;
private ThreadPoolExecutor stripedReadPool;

看线程池的初始化：

  // ErasureCodingWorker#initializeStripedReadThreadPool
  private void initializeStripedReadThreadPool() {
    LOG.debug("Using striped reads");

    // Essentially, this is a cachedThreadPool.
    stripedReadPool = new ThreadPoolExecutor(0, Integer.MAX_VALUE,
        60, TimeUnit.SECONDS,
        new SynchronousQueue<>(),
        new Daemon.DaemonFactory() {
          private final AtomicInteger threadIndex = new AtomicInteger(0);
          @Override
          public Thread newThread(Runnable r) {
            Thread t = super.newThread(r);
            t.setName("stripedRead-" + threadIndex.getAndIncrement());
            return t;
          }
        },
        new ThreadPoolExecutor.CallerRunsPolicy() {
          @Override
          public void rejectedExecution(Runnable runnable,
                                        ThreadPoolExecutor e) {
            LOG.info("Execution for striped reading rejected, "
                + "Executing in current thread");
            // will run in the current thread
            super.rejectedExecution(runnable, e);
          }
        });

    stripedReadPool.allowCoreThreadTimeOut(true);
  }

队列使用了无界的new SynchronousQueue<>()；使用线程工厂来命名；使用默认的自定义拒绝策略（仅仅打印log，该策略也会执行线程）。核心线程在allowCoreThreadTimeout被设置为true时会超时退出，默认情况下不会退出。当线程空闲时间达到keepAliveTime（上述为60s），该线程会退出，直到线程数量等于corePoolSize。如果allowCoreThreadTimeout设置为true，则所有线程均会退出直到线程数量为0。

第二个线程池初始化：

  // ErasureCodingWorker#initializeStripedBlkReconstructionThreadPool
  private void initializeStripedBlkReconstructionThreadPool(int numThreads) {
    LOG.debug("Using striped block reconstruction; pool threads={}", numThreads);
    stripedReconstructionPool = DFSUtilClient.getThreadPoolExecutor(2,
        numThreads, 60, new LinkedBlockingQueue<>(),
        "StripedBlockReconstruction-", false);
    stripedReconstructionPool.allowCoreThreadTimeOut(true);
  }
  
  //上述封装了getThreadPool方法，放在Util中，如下：
  // DFSUtilClient#getThreadPoolExecutor
  public static ThreadPoolExecutor getThreadPoolExecutor(
      int corePoolSize,
      int maxPoolSize, 
      long keepAliveTimeSecs, 
      BlockingQueue<Runnable> queue,
      String threadNamePrefix, 
      boolean runRejectedExec) {
    Preconditions.checkArgument(corePoolSize > 0);
    ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(corePoolSize,
        maxPoolSize, keepAliveTimeSecs, TimeUnit.SECONDS,
        queue, new Daemon.DaemonFactory() {
          private final AtomicInteger threadIndex = new AtomicInteger(0);

          @Override
          public Thread newThread(Runnable r) {
            Thread t = super.newThread(r);
            t.setName(threadNamePrefix + threadIndex.getAndIncrement());
            return t;
          }
        });
    if (runRejectedExec) {
      threadPoolExecutor.setRejectedExecutionHandler(new ThreadPoolExecutor
          .CallerRunsPolicy() {
        @Override
        public void rejectedExecution(Runnable runnable,
            ThreadPoolExecutor e) {
          LOG.info(threadNamePrefix + " task is rejected by " +
                  "ThreadPoolExecutor. Executing it in current thread.");
          // will run in the current thread
          super.rejectedExecution(runnable, e);
        }
      });
    }
    return threadPoolExecutor;
  }

封装了get线程池方法，总体还是一样。拒绝策略传入false，不设策略，及即使用默认策略：

/**
 * The default rejected execution handler
 */
 private static final RejectedExecutionHandler defaultHandler = new AbortPolicy();

线程池初始化后，看下线程池提交线程的流程：

  public void processErasureCodingTasks(
      Collection<BlockECReconstructionInfo> ecTasks) {
    for (BlockECReconstructionInfo reconInfo : ecTasks) {
      int xmitsSubmitted = 0;
      try {
        StripedReconstructionInfo stripedReconInfo =
            new StripedReconstructionInfo(
            reconInfo.getExtendedBlock(), reconInfo.getErasureCodingPolicy(),
            reconInfo.getLiveBlockIndices(), reconInfo.getSourceDnInfos(),
            reconInfo.getTargetDnInfos(), reconInfo.getTargetStorageTypes(),
            reconInfo.getTargetStorageIDs());
        // It may throw IllegalArgumentException from task#stripedReader
        // constructor.
        final StripedBlockReconstructor task =
            new StripedBlockReconstructor(this, stripedReconInfo);
        if (task.hasValidTargets()) {
          // See HDFS-12044. We increase xmitsInProgress even the task is only
          // enqueued, so that
          //   1) NN will not send more tasks than what DN can execute and
          //   2) DN will not throw away reconstruction tasks, and instead keeps
          //      an unbounded number of tasks in the executor's task queue.
          xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1);
          getDatanode().incrementXmitsInProcess(xmitsSubmitted);
          stripedReconstructionPool.submit(task);
        } else {
          LOG.warn("No missing internal block. Skip reconstruction for task:{}",
              reconInfo);
        }
      } catch (Throwable e) {
        getDatanode().decrementXmitsInProgress(xmitsSubmitted);
        LOG.warn("Failed to reconstruct striped block {}",
            reconInfo.getExtendedBlock().getLocalBlock(), e);
      }
    }
  }

在这里插入图片描述
提交到线程池后，接下来就交给了StripedBlockReconstructor implements Runnable的run()。

  //StripedBlockReconstructor#run
  public void run() {
    try {
      initDecoderIfNecessary();

      getStripedReader().init();

      stripedWriter.init();

      reconstruct();

      stripedWriter.endTargetBlocks();

      // Currently we don't check the acks for packets, this is similar as
      // block replication.
    } catch (Throwable e) {
      LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
      getDatanode().getMetrics().incrECFailedReconstructionTasks();
    } finally {
      getDatanode().decrementXmitsInProgress(getXmits());
      final DataNodeMetrics metrics = getDatanode().getMetrics();
      metrics.incrECReconstructionTasks();
      metrics.incrECReconstructionBytesRead(getBytesRead());
      metrics.incrECReconstructionRemoteBytesRead(getRemoteBytesRead());
      metrics.incrECReconstructionBytesWritten(getBytesWritten());
      getStripedReader().close();
      stripedWriter.close();
      cleanup();
    }
  }