HDFS EC Reconstruction

hdfs ec重构块的代码设计巧妙,本文总结其设计思想。
下文先分析DN重构块的流程。
DN收到NN下达的命令后,判断如果是 BlockECReconstructionCommand 命令,DN则开始重构工作。

// BPOfferService.java
case DatanodeProtocol.DNA_ERASURE_CODING_RECONSTRUCTION:
  LOG.info("DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY");
  Collection<BlockECReconstructionInfo> ecTasks =
      ((BlockECReconstructionCommand) cmd).getECTasks(); //1
  dn.getErasureCodingWorker().processErasureCodingTasks(ecTasks); //2
  break;

我们看一下重构的对象 Collection ecTasks 的结构,包括块组id和源及目标DN信息等:

public static class BlockECReconstructionInfo {
 private final ExtendedBlock block;
 private final DatanodeInfo[] sources;
 private DatanodeInfo[] targets;
 private String[] targetStorageIDs;
 private StorageType[] targetStorageTypes;
 private final byte[] liveBlockIndices;
 private final ErasureCodingPolicy ecPolicy;
 ...

dn.getErasureCodingWorker()获得 private ErasureCodingWorker ecWorker; 那么,ecWorker是一个什么角色呢?我们看定义:

/**
 * ErasureCodingWorker handles the erasure coding reconstruction work commands.
 * These commands would be issued from Namenode as part of Datanode's heart beat
 * response. BPOfferService delegates the work to this class for handling EC
 * commands.
 */
public final class ErasureCodingWorker {
  private static final Logger LOG = DataNode.LOG;
 
  private final DataNode datanode;
  private final Configuration conf;
  private final float xmitWeight;
 
  private ThreadPoolExecutor stripedReconstructionPool;
  private ThreadPoolExecutor stripedReadPool;

ErasureCodingWorker 用于处理EC的重构命令,该命令是NN给DN的心跳回复。

ECWorker在整个架构中的角色如下图所示,用于服务DN有关块构建恢复工作。
在这里插入图片描述
在这里插入图片描述
线程池篇

我们看一下ecWorker中的两个线程池:

  • private ThreadPoolExecutor stripedReconstructionPool;
  • private ThreadPoolExecutor stripedReadPool;

线程池的初始化

  // ErasureCodingWorker#initializeStripedReadThreadPool
  private void initializeStripedReadThreadPool() {
    LOG.debug("Using striped reads");

    // Essentially, this is a cachedThreadPool.
    stripedReadPool = new ThreadPoolExecutor(0, Integer.MAX_VALUE,
        60, TimeUnit.SECONDS,
        new SynchronousQueue<>(),
        new Daemon.DaemonFactory() {
          private final AtomicInteger threadIndex = new AtomicInteger(0);
          @Override
          public Thread newThread(Runnable r) {
            Thread t = super.newThread(r);
            t.setName("stripedRead-" + threadIndex.getAndIncrement());
            return t;
          }
        },
        new ThreadPoolExecutor.CallerRunsPolicy() {
          @Override
          public void rejectedExecution(Runnable runnable,
                                        ThreadPoolExecutor e) {
            LOG.info("Execution for striped reading rejected, "
                + "Executing in current thread");
            // will run in the current thread
            super.rejectedExecution(runnable, e);
          }
        });

    stripedReadPool.allowCoreThreadTimeOut(true);
  }

队列使用了无界的new SynchronousQueue<>();使用线程工厂来命名;使用默认的自定义拒绝策略(仅仅打印log,该策略也会执行线程)。核心线程在allowCoreThreadTimeout被设置为true时会超时退出,默认情况下不会退出。当线程空闲时间达到keepAliveTime(上述为60s),该线程会退出,直到线程数量等于corePoolSize。如果allowCoreThreadTimeout设置为true,则所有线程均会退出直到线程数量为0。

第二个线程池初始化

  // ErasureCodingWorker#initializeStripedBlkReconstructionThreadPool
  private void initializeStripedBlkReconstructionThreadPool(int numThreads) {
    LOG.debug("Using striped block reconstruction; pool threads={}", numThreads);
    stripedReconstructionPool = DFSUtilClient.getThreadPoolExecutor(2,
        numThreads, 60, new LinkedBlockingQueue<>(),
        "StripedBlockReconstruction-", false);
    stripedReconstructionPool.allowCoreThreadTimeOut(true);
  }
  
  //上述封装了getThreadPool方法,放在Util中,如下:
  // DFSUtilClient#getThreadPoolExecutor
  public static ThreadPoolExecutor getThreadPoolExecutor(
      int corePoolSize,
      int maxPoolSize, 
      long keepAliveTimeSecs, 
      BlockingQueue<Runnable> queue,
      String threadNamePrefix, 
      boolean runRejectedExec) {
    Preconditions.checkArgument(corePoolSize > 0);
    ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(corePoolSize,
        maxPoolSize, keepAliveTimeSecs, TimeUnit.SECONDS,
        queue, new Daemon.DaemonFactory() {
          private final AtomicInteger threadIndex = new AtomicInteger(0);

          @Override
          public Thread newThread(Runnable r) {
            Thread t = super.newThread(r);
            t.setName(threadNamePrefix + threadIndex.getAndIncrement());
            return t;
          }
        });
    if (runRejectedExec) {
      threadPoolExecutor.setRejectedExecutionHandler(new ThreadPoolExecutor
          .CallerRunsPolicy() {
        @Override
        public void rejectedExecution(Runnable runnable,
            ThreadPoolExecutor e) {
          LOG.info(threadNamePrefix + " task is rejected by " +
                  "ThreadPoolExecutor. Executing it in current thread.");
          // will run in the current thread
          super.rejectedExecution(runnable, e);
        }
      });
    }
    return threadPoolExecutor;
  }  

封装了get线程池方法,总体还是一样。拒绝策略传入false,不设策略,及即使用默认策略:

/**
 * The default rejected execution handler
 */
 private static final RejectedExecutionHandler defaultHandler = new AbortPolicy();
 

线程池初始化后,看下线程池提交线程的流程:

  public void processErasureCodingTasks(
      Collection<BlockECReconstructionInfo> ecTasks) {
    for (BlockECReconstructionInfo reconInfo : ecTasks) {
      int xmitsSubmitted = 0;
      try {
        StripedReconstructionInfo stripedReconInfo =
            new StripedReconstructionInfo(
            reconInfo.getExtendedBlock(), reconInfo.getErasureCodingPolicy(),
            reconInfo.getLiveBlockIndices(), reconInfo.getSourceDnInfos(),
            reconInfo.getTargetDnInfos(), reconInfo.getTargetStorageTypes(),
            reconInfo.getTargetStorageIDs());
        // It may throw IllegalArgumentException from task#stripedReader
        // constructor.
        final StripedBlockReconstructor task =
            new StripedBlockReconstructor(this, stripedReconInfo);
        if (task.hasValidTargets()) {
          // See HDFS-12044. We increase xmitsInProgress even the task is only
          // enqueued, so that
          //   1) NN will not send more tasks than what DN can execute and
          //   2) DN will not throw away reconstruction tasks, and instead keeps
          //      an unbounded number of tasks in the executor's task queue.
          xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1);
          getDatanode().incrementXmitsInProcess(xmitsSubmitted);
          stripedReconstructionPool.submit(task);
        } else {
          LOG.warn("No missing internal block. Skip reconstruction for task:{}",
              reconInfo);
        }
      } catch (Throwable e) {
        getDatanode().decrementXmitsInProgress(xmitsSubmitted);
        LOG.warn("Failed to reconstruct striped block {}",
            reconInfo.getExtendedBlock().getLocalBlock(), e);
      }
    }
  }

在这里插入图片描述
提交到线程池后,接下来就交给了StripedBlockReconstructor implements Runnablerun()

  //StripedBlockReconstructor#run
  public void run() {
    try {
      initDecoderIfNecessary();

      getStripedReader().init();

      stripedWriter.init();

      reconstruct();

      stripedWriter.endTargetBlocks();

      // Currently we don't check the acks for packets, this is similar as
      // block replication.
    } catch (Throwable e) {
      LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
      getDatanode().getMetrics().incrECFailedReconstructionTasks();
    } finally {
      getDatanode().decrementXmitsInProgress(getXmits());
      final DataNodeMetrics metrics = getDatanode().getMetrics();
      metrics.incrECReconstructionTasks();
      metrics.incrECReconstructionBytesRead(getBytesRead());
      metrics.incrECReconstructionRemoteBytesRead(getRemoteBytesRead());
      metrics.incrECReconstructionBytesWritten(getBytesWritten());
      getStripedReader().close();
      stripedWriter.close();
      cleanup();
    }
  }
HDFS EC设计文档指的是对Hadoop分布式文件系统(HDFS)中的错误纠正(EC)功能进行设计的文档。EC功能的目的是在大规模数据存储环境中提高数据的可靠性和容错性。 在设计这个文档时,需要考虑的几个重要因素包括: 1. 容错性:HDFS EC功能可以通过在文件块之间进行冗余和编码来提高数据的容错能力。通过采用错误纠正编码算法,可以将原始的数据块转换为一系列编码数据块,并使得其中一部分编码数据块即可用于还原原始数据块。因此,文档需要定义EC算法的选择与实施方式,以确保数据的完整性和可靠性。 2. 存储效率:EC功能可以通过减少冗余度来提高存储效率。文档需要详细描述如何对数据进行编码和解码,以减少存储开销和带宽消耗。例如,可以使用Reed-Solomon编码或Erasure Code来实现这一功能。 3. 性能考虑:EC功能对系统性能有一定的影响。在设计文档时,需要评估EC功能对数据读写操作的性能影响,并根据用户需求和应用场景来选择合适的EC算法。例如,可以根据文件的重要性和可靠性需求,选择相应的恢复速度和存储开销的折中方案。 4. 配置和管理:文档需要讨论如何配置和管理EC功能。包括如何设置EC编解码策略、调整EC参数以及监控EC模块的运行状态等。这将有助于管理员和开发人员更好地理解和利用HDFS EC功能。 综上所述,HDFS EC设计文档需要详细描述对HDFS分布式文件系统中的EC功能的设计和实现方式。文档应该覆盖到容错性、存储效率、性能以及配置和管理等多个方面,以帮助用户和管理员更好地理解和应用这一功能。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值