BlockReader接口实现类的read操作----BlockReaderLocalLegacy类中read(ByteBuffer buf)函数

最新推荐文章于 2022-10-14 16:34:27 发布

乘风如水

最新推荐文章于 2022-10-14 16:34:27 发布

阅读量432

点赞数 1

分类专栏： hadoop

本文链接：https://blog.csdn.net/weixin_39935887/article/details/86485919

版权

hadoop 专栏收录该内容

36 篇文章 2 订阅

订阅专栏

之前我们分析了四个构造BlockReader类对象函数

1、getLegacyBlockReaderLocal(),该函数返回一个BlockReaderLocalLegacy类对象

2、getBlockReaderLocal(),该函数返回一个BlockReaderLocal类对象

3、getRemoteBlockReaderFromDomain(),该函数返回一个RemoteBlockReader2类对象

4、getRemoteBlockReaderFromTcp(),该函数返回一个RemoteBlockReader2类对象

这三个类的关系如下：

BlockReader类

接下来我们开始分析这三个类的read函数。

我们回顾一下之前分析过的read函数，主要分析的是从FSDataInputStream类中的read函数到DFSInputStream类read函数，再到BlockReaderLocal类read函数的过程，但我们没有深入到相应的BlockReaderLocal类的read函数中，这里我们主要开始分析这个类中的read函数，来看看客户端是怎么读取到相应的文件数据的。

总结下来，最终调用BlockReaderLocal类中的函数有两个:

第一个:int read(ByteBuffer buf)

第二个:int read(byte[] buf, int off, int len)

接下来我们将分别分析上面三个类中的这两个函数。

BlockReaderLocalLegacy类中的read函数

BlockReaderLocalLegacy类是老版本的短路读类,基本上已经废弃了，不过我们这里还是进行分析一下，了解它的思路。我们开始分析函数read(ByteBuffer buf)，该函数代码如下：

@Override
  public synchronized int read(ByteBuffer buf) throws IOException {
    int nRead = 0;
    //如果是校验和读取
    if (verifyChecksum) {
      // A 'direct' read actually has three phases. The first drains any
      // remaining bytes from the slow read buffer. After this the read is
      // guaranteed to be on a checksum chunk boundary. If there are still bytes
      // to read, the fast direct path is used for as many remaining bytes as
      // possible, up to a multiple of the checksum chunk size. Finally, any
      // 'odd' bytes remaining at the end of the read cause another slow read to
      // be issued, which involves an extra copy.

      // Every 'slow' read tries to fill the slow read buffer in one go for
      // efficiency's sake. As described above, all non-checksum-chunk-aligned
      // reads will be served from the slower read path.
      //如果slowReadBuff还有空间存储数据
      if (slowReadBuff.hasRemaining()) {
        // There are remaining bytes from a small read available. This usually
        // means this read is unaligned, which falls back to the slow path.
    	//获取buf和slowReadBuff剩余可操作的空间大小
        int fromSlowReadBuff = Math.min(buf.remaining(), slowReadBuff.remaining());
        //从slowReadBuff中拷贝fromSlowReadBuff个字节到buf中
        /*
         * 1、如果buf中limit - position大于slowReadBuff中limit - position，那么fromSlowReadBuff等于slowReadBuff.remaining(),此时是将slowReadBuff剩余字节拷贝给buf
         * 2、如果buf中limit - position小于slowReadBuff中limit - position,那么就将slowReadBuff剩余字节拷贝给buf，直到buf被填满
         */
        writeSlice(slowReadBuff, buf, fromSlowReadBuff);
        nRead += fromSlowReadBuff;
      }

      if (buf.remaining() >= bytesPerChecksum && offsetFromChunkBoundary == 0) {
        // Since we have drained the 'small read' buffer, we are guaranteed to
        // be chunk-aligned
    	// 这里获取最大的能够被bytesPerChecksum(表示每个校验块所占的字节数，一个数据块会被分成多个校验块，每个校验块对应一个校验和,校验和存储在校验文件中)整除的大小
        // 由于数据在校验的时候是一个校验块一个校验块比对的,所以在对数据进行校验的时候一定要保证校验的数据是校验块所占字节数的整数倍,否则是无法比对的。
        int len = buf.remaining() - (buf.remaining() % bytesPerChecksum);

        // There's only enough checksum buffer space available to checksum one
        // entire slow read buffer. This saves keeping the number of checksum
        // chunks around.
        //取最小值
        len = Math.min(len, slowReadBuff.capacity());
        //获取buf的limit
        int oldlimit = buf.limit();
        //设置buf的limit
        buf.limit(buf.position() + len);
        int readResult = 0;
        try {
          //读取校验和文件数据并进行校验和比对
          readResult = doByteBufferRead(buf);
        } finally {
          //将buf的limit恢复到原来的状态
          buf.limit(oldlimit);
        }
        if (readResult == -1) {
          return nRead;
        } else {
          //更新读取的数据大小值
          nRead += readResult;
          //设置buf的position
          buf.position(buf.position() + readResult);
        }
      }

      // offsetFromChunkBoundary > 0 => unaligned read, use slow path to read
      // until chunk boundary
      if ((buf.remaining() > 0 && buf.remaining() < bytesPerChecksum) || offsetFromChunkBoundary > 0) {
        int toRead = Math.min(buf.remaining(), bytesPerChecksum - offsetFromChunkBoundary);
        int readResult = fillSlowReadBuffer(toRead);
        if (readResult == -1) {
          return nRead;
        } else {
          int fromSlowReadBuff = Math.min(readResult, buf.remaining());
          writeSlice(slowReadBuff, buf, fromSlowReadBuff);
          nRead += fromSlowReadBuff;
        }
      }
    } else {
      // Non-checksummed reads are much easier; we can just fill the buffer directly.
      //如果不是校验和读取，那么就直接将数据拷贝到buffer中
      nRead = doByteBufferRead(buf);
      if (nRead > 0) {
        buf.position(buf.position() + nRead);
      }
    }
    return nRead;
  }

doByteBufferRead函数代码如下：

**
   * Tries to read as many bytes as possible into supplied buffer(缓冲区), checksumming
   * each chunk if needed.
   *
   * <b>Preconditions:</b>
   * <ul>
   * <li>
   * If checksumming is enabled, buf.remaining must be a multiple of
   * bytesPerChecksum. Note that this is not a requirement for clients of
   * read(ByteBuffer) - in the case of non-checksum-sized read requests,
   * read(ByteBuffer) will substitute a suitably sized buffer to pass to this
   * method.
   * </li>
   * </ul>
   * <b>Postconditions:</b>
   * <ul>
   * <li>buf.limit and buf.mark are unchanged.</li>
   * <li>buf.position += min(offsetFromChunkBoundary, totalBytesRead) - so the
   * requested bytes can be read straight from the buffer</li>
   * </ul>
   *
   * @param buf
   *          byte buffer to write bytes to. If checksums are not required, buf
   *          can have any number of bytes remaining, otherwise there must be a
   *          multiple of the checksum chunk size remaining.
   * @return <tt>max(min(totalBytesRead, len) - offsetFromChunkBoundary, 0)</tt>
   *         that is, the the number of useful bytes (up to the amount
   *         requested) readable from the buffer by the client.
   */
  //如果是校验和读,那么buf的limit-position大小就要是校验块大小的整数倍
  private synchronized int doByteBufferRead(ByteBuffer buf) throws IOException {
    if (verifyChecksum) {
      assert buf.remaining() % bytesPerChecksum == 0;
    }
    int dataRead = -1;

    int oldpos = buf.position();
    // Read as much as we can into the buffer.
    //将dataIn中的数据读入到buf中,直到dataIn中的数据都读完了或者buf中已经没有多余的空间了
    dataRead = fillBuffer(dataIn, buf);

    if (dataRead == -1) {
      return -1;
    }
    //如果是校验和读
    if (verifyChecksum) {
      //创建一个ByteBuffer类对象，同时与buf共享同一份数据
      ByteBuffer toChecksum = buf.duplicate();
      toChecksum.position(oldpos);
      toChecksum.limit(oldpos + dataRead);

      checksumBuff.clear();
      // Equivalent to (int)Math.ceil(toChecksum.remaining() * 1.0 / bytesPerChecksum );
      //获取toChecksum中还剩下的空间能够存储的校验块的数量,这里用的方法可以借鉴一下
      int numChunks =
        (toChecksum.remaining() + bytesPerChecksum - 1) / bytesPerChecksum;
      
      checksumBuff.limit(checksumSize * numChunks);

      //将checksumIn中的数据读入到checksumBuff中
      fillBuffer(checksumIn, checksumBuff);
      //将checksumBuff中的position赋值给limit,将position置0
      checksumBuff.flip();
      //开始对数据进行校验，如果不匹配那么就抛出异常
      checksum.verifyChunkedSums(toChecksum, checksumBuff, filename,
          this.startOffset);
    }

    if (dataRead >= 0) {
    	//读到了数据，那么就将buf的position设置为oldpos+offsetFromChunkBoundary和最小值dataRead
        buf.position(oldpos + Math.min(offsetFromChunkBoundary, dataRead));
    }

    if (dataRead < offsetFromChunkBoundary) {
      // yikes, didn't even get enough bytes to honour offset. This can happen
      // even if we are verifying checksums if we are at EOF.
      offsetFromChunkBoundary -= dataRead;
      dataRead = 0;
    } else {
      dataRead -= offsetFromChunkBoundary;
      offsetFromChunkBoundary = 0;
    }

    return dataRead;
  }

writeSlice函数代码如下：

/**
   * Utility method used by read(ByteBuffer) to partially copy a ByteBuffer into
   * another.
   */
  private void writeSlice(ByteBuffer from, ByteBuffer to, int length) {
	//获取from中能存储的最大字节数
    int oldLimit = from.limit();
    //将有效字节大小扩大到from.position+length
    from.limit(from.position() + length);
    try {
      //将from中的数据拷贝到to中
      to.put(from);
    } finally {
      //将from中能存储的最大字节数恢复到原来状态
      from.limit(oldLimit);
    }
  }

读取数据，然后将文件数据和校验文件数据进行校验，如果校验失败那么抛出异常。

乘风如水

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
BlockReader接口实现类的read操作----BlockReaderLocalLegacy类中read(ByteBuffer buf)函数

之前我们分析了四个构造BlockReader类对象函数1、getLegacyBlockReaderLocal(),该函数返回一个BlockReaderLocalLegacy类对象2、getBlockReaderLocal(),该函数返回一个BlockReaderLocal类对象3、getRemoteBlockReaderFromDomain(),该函数返回一个RemoteBlockRe...
复制链接

扫一扫

专栏目录