Hadoop学习笔记5

Hadoop学习笔记5

本节介绍EC模式下,HDFS在文件下载过程中遇到目标节点失效,是如何处理的。

过程简述

用户从HDFS中下载文件的过程如上图所示,在web下载模式中,用户首先登陆web服务器,web服务器会从namenode中获取文件存储的信息,客户端又从web服务器上获得存储信息,然后从存储该文件的其中一个(实际就是第一个)服务器上开始下载文件。显然,被选中的服务器(称为currentnode)上只有该文件的一部分数据,其他数据由其他节点发送给currentnode(实际中是由currentnode读取其他datanode指定位置的数据)。在这个过程中,需要发送给currentnode数据的datanode可能会有失联(fail to connect ),此时,currentnode会读取另外的datanode中的数据,在EC模式下,此时读取的一般为校验块,因此需要进行decode才能得到数据。

源码分析

我们从DFSClient开始分析

/**
 * Create an input stream that obtains a nodelist from the
 * namenode, and then reads from all the right places.  Creates
 * inner subclass of InputStream that does the right out-of-band
 * work.
 */
public DFSInputStream open(String src, int buffersize, boolean verifyChecksum)
    throws IOException {
  checkOpen();
  //    Get block info from namenode
  try (TraceScope ignored = newPathTraceScope("newDFSInputStream", src)) {
    LocatedBlocks locatedBlocks = getLocatedBlocks(src, 0);
    return openInternal(locatedBlocks, src, verifyChecksum);
  }
}

首先是open()方法,该方法返回一个DFSInputStream,功能在注释中给出,即根据namenode中的节点信息建立读取数据流。此方法调用了openInternal方法

private DFSInputStream openInternal(LocatedBlocks locatedBlocks, String src,
    boolean verifyChecksum) throws IOException {
  if (locatedBlocks != null) {
    ErasureCodingPolicy ecPolicy = locatedBlocks.getErasureCodingPolicy();
    if (ecPolicy != null) {
      return new DFSStripedInputStream(this, src, verifyChecksum, ecPolicy,
          locatedBlocks);
    }
    return new DFSInputStream(this, src, verifyChecksum, locatedBlocks);
  } else {
    throw new IOException("Cannot open filename " + src);
  }
}

以上代码比较容易理解,即根据是否使用了EC策略调用不同的InputStream构造函数,接下来看DFSStripedInputStream

读取是从readWithStrategy()方法开始

protected synchronized int readWithStrategy(ReaderStrategy strategy)
    throws IOException {
  dfsClient.checkOpen();
  if (closed.get()) {
    throw new IOException("Stream closed");
  }

  int len = strategy.getTargetLength();
  CorruptedBlocks corruptedBlocks = new CorruptedBlocks();
  if (pos < getFileLength()) {
    try {
      if (pos > blockEnd) {
        blockSeekTo(pos);
      }
      int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
      synchronized (infoLock) {
        if (locatedBlocks.isLastBlockComplete()) {
          realLen = (int) Math.min(realLen,
              locatedBlocks.getFileLength() - pos);
        }
      }

      /** Number of bytes already read into buffer */
      int result = 0;
      while (result < realLen) {
        if (!curStripeRange.include(getOffsetInBlockGroup())) {
          readOneStripe(corruptedBlocks);
        }
        int ret = copyToTargetBuf(strategy, realLen - result);
        result += ret;
        pos += ret;
      }
      return result;
    } finally {
      // Check if need to report block replicas corruption either read
      // was successful or ChecksumException occurred.
      reportCheckSumFailure(corruptedBlocks, getCurrentBlockLocationsLength(),
          true);
    }
  }
  return -1;
}

整个方法的功能就是依据读取策略读取数据,策略其实很简单,就是读取长度和buffer。方法的27行调用了readOneStripe()方法,调用的条件为已经超出了当前的stripe范围但是依然没有读取到足够的数据,即result<realLen,可以理解为当前块因为损坏或其他原因无法提供数据。下面我们来看readOneStripe()

/**
 * Read a new stripe covering the current position, and store the data in the
 * {@link #curStripeBuf}.
 */
private void readOneStripe(CorruptedBlocks corruptedBlocks)
    throws IOException {
  resetCurStripeBuffer(true);

  // compute stripe range based on pos
  final long offsetInBlockGroup = getOffsetInBlockGroup();
  final long stripeLen = cellSize * dataBlkNum;
  final int stripeIndex = (int) (offsetInBlockGroup / stripeLen);
  final int stripeBufOffset = (int) (offsetInBlockGroup % stripeLen);
  final int stripeLimit = (int) Math.min(currentLocatedBlock.getBlockSize()
      - (stripeIndex * stripeLen), stripeLen);
  StripeRange stripeRange =
      new StripeRange(offsetInBlockGroup, stripeLimit - stripeBufOffset);

  LocatedStripedBlock blockGroup = (LocatedStripedBlock) currentLocatedBlock;
  AlignedStripe[] stripes = StripedBlockUtil.divideOneStripe(ecPolicy,
      cellSize, blockGroup, offsetInBlockGroup,
      offsetInBlockGroup + stripeRange.getLength() - 1, curStripeBuf);
  final LocatedBlock[] blks = StripedBlockUtil.parseStripedBlockGroup(
      blockGroup, cellSize, dataBlkNum, parityBlkNum, localParityBlkNum);
  // read the whole stripe
  for (AlignedStripe stripe : stripes) {
    // Parse group to get chosen DN location
    StripeReader sreader = new StatefulStripeReader(stripe, ecPolicy, blks,
        blockReaders, corruptedBlocks, decoder, this);
    sreader.readStripe();
  }
  curStripeBuf.position(stripeBufOffset);
  curStripeBuf.limit(stripeLimit);
  curStripeRange = stripeRange;
}

从注释中可以看出,该方法的功能为读取新数据覆盖当前stripe,其中divideOneStripe()和parseStripedBlockGroup()方法将在以后进行分析,前面的过程都是为了后面读取条带做铺垫,下面重点分析核心内容readStripe()

/**
 * read the whole stripe. do decoding if necessary
 */
void readStripe() throws IOException {
  alignedStripe.missingLocalChunksNum = new int[ecPolicy.getNumLocalParityUnits()];
  if(ecPolicy.getSchema().getCodecName() != "lrc"){
    for (int i = 0; i < dataBlkNum; i++) {
      if (alignedStripe.chunks[i] != null &&
          alignedStripe.chunks[i].state != StripingChunk.ALLZERO) {
        if (!readChunk(targetBlocks[i], i)) {
          alignedStripe.missingChunksNum++;
        }
      }
    }
    if (alignedStripe.missingChunksNum > 0) {
      checkMissingBlocks();
      readDataForDecoding();
      // read parity chunks
      readParityChunks(alignedStripe.missingChunksNum);
    }
  }else{
    for (int i = 0; i < dataBlkNum; i++) {
      if (alignedStripe.chunks[i] != null &&
          alignedStripe.chunks[i].state != StripingChunk.ALLZERO) {
        if (!readChunk(targetBlocks[i], i)) {
          if(i > dataBlkNum/localParityBlkNum) {
            alignedStripe.missingChunksNum++;
            alignedStripe.missingLocalChunksNum[1]++;
          }else{
            alignedStripe.missingChunksNum++;
            alignedStripe.missingLocalChunksNum[0]++;
          }
        }
      }
    }
    if (alignedStripe.missingLocalChunksNum[0] > 0 || alignedStripe.missingLocalChunksNum[1] > 0) {
      checkMissingBlocks();
      readDataForDecoding();
      // read parity chunks
      readLocalParityChunks(alignedStripe.missingLocalChunksNum);
    }
  }

  // There are missing block locations at this stage. Thus we need to read
  // the full stripe and one more parity block.

  // TODO: for a full stripe we can start reading (dataBlkNum + 1) chunks

  // Input buffers for potential decode operation, which remains null until
  // first read failure
  while (!futures.isEmpty()) {
    try {
      StripingChunkReadResult r = StripedBlockUtil
          .getNextCompletedStripedRead(service, futures, 0);
      dfsStripedInputStream.updateReadStats(r.getReadStats());
      if (DFSClient.LOG.isDebugEnabled()) {
        DFSClient.LOG.debug("Read task returned: " + r + ", for stripe "
            + alignedStripe);
      }
      StripingChunk returnedChunk = alignedStripe.chunks[r.index];
      Preconditions.checkNotNull(returnedChunk);
      Preconditions.checkState(returnedChunk.state == StripingChunk.PENDING);

      if (r.state == StripingChunkReadResult.SUCCESSFUL) {
        returnedChunk.state = StripingChunk.FETCHED;
        alignedStripe.fetchedChunksNum++;
        updateState4SuccessRead(r);
        if (alignedStripe.fetchedChunksNum == dataBlkNum) {
          clearFutures();
          break;
        }
      } else {
        returnedChunk.state = StripingChunk.MISSING;
        // close the corresponding reader
        dfsStripedInputStream.closeReader(readerInfos[r.index]);

        final int missing = alignedStripe.missingChunksNum;
        alignedStripe.missingChunksNum++;
        checkMissingBlocks();

        readDataForDecoding();
        readParityChunks(alignedStripe.missingChunksNum - missing);
      }
    } catch (InterruptedException ie) {
      String err = "Read request interrupted";
      DFSClient.LOG.error(err);
      clearFutures();
      // Don't decode if read interrupted
      throw new InterruptedIOException(err);
    }
  }

  if (alignedStripe.missingChunksNum > 0) {
    decode();
  }
}

为了适应LRC码,该方法进行了修改。结合之前的学习笔记中的内容,chunk是stripe读取时的最小单位,可以理解为cell+校验。首先,readStripe要读取一个完整的数据条带,即dataBlkNum的长度,得到其中错误的chunk个数alignedStripe.missingChunksNum,然后通过readParityChunks或者readLocalParityChunks读取校验块的chunk,其中futures中保存了读取的所有chunk,最后,如果missingChunksNum>0也就是有错误的情况下(此时必然启用了校验块),那么就进行解码decode操作

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值