HDFS读文件流程概述

最新推荐文章于 2024-04-25 09:17:55 发布

乘风如水

最新推荐文章于 2024-04-25 09:17:55 发布

阅读量443

点赞数 1

分类专栏： hadoop

本文链接：https://blog.csdn.net/weixin_39935887/article/details/87279021

版权

hadoop 专栏收录该内容

36 篇文章 2 订阅

订阅专栏

我们之前讲过FSDataInputStream类中的read函数(总共有四篇，网址分别是read(1)、read(2)、read(3)、read(4))，这些read函数会调用DFSInputStream类中的相应的read函数，在DFSInputStream类的read函数分别有：

//第一个read函数
public synchronized int read()

//第二个read函数
public synchronized int read(final byte buf[], int off, int len)

//第三个read函数
public synchronized int read(final ByteBuffer buf)

//第四个read函数
public int read(long position, byte[] buffer, int offset, int length)

//第五个read函数

public synchronized ByteBuffer read(ByteBufferPool bufferPool,int maxLength, EnumSet<ReadOption> opts)

这5个read函数都会创建BlockReaderFactory类对象，并执行该对象的build函数，这个函数代码如下：

/**
   * Build a BlockReader with the given options.
   *
   * This function will do the best it can to create a block reader that meets
   * all of our requirements.  We prefer short-circuit block readers
   * (BlockReaderLocal and BlockReaderLocalLegacy) over remote ones, since the
   * former avoid the overhead of socket communication.  If short-circuit is
   * unavailable, our next fallback is data transfer over UNIX domain sockets,
   * if dfs.client.domain.socket.data.traffic has been enabled.  If that doesn't
   * work, we will try to create a remote block reader that operates over TCP
   * sockets.
   *
   * There are a few caches that are important here.
   *
   * The ShortCircuitCache stores file descriptor objects which have been passed
   * from the DataNode. 
   *
   * The DomainSocketFactory stores information about UNIX domain socket paths
   * that we not been able to use in the past, so that we don't waste time
   * retrying them over and over.  (Like all the caches, it does have a timeout,
   * though.)
   *
   * The PeerCache stores peers that we have used in the past.  If we can reuse
   * one of these peers, we avoid the overhead of re-opening a socket.  However,
   * if the socket has been timed out on the remote end, our attempt to reuse
   * the socket may end with an IOException.  For that reason, we limit our
   * attempts at socket reuse to dfs.client.cached.conn.retry times.  After
   * that, we create new sockets.  This avoids the problem where a thread tries
   * to talk to a peer that it hasn't talked to in a while, and has to clean out
   * every entry in a socket cache full of stale entries.
   *
   * @return The new BlockReader.  We will not return null.
   *
   * @throws InvalidToken
   *             If the block token was invalid.
   *         InvalidEncryptionKeyException
   *             If the encryption key was invalid.
   *         Other IOException
   *             If there was another problem.
   */
  public BlockReader build() throws IOException {
    BlockReader reader = null;

    Preconditions.checkNotNull(configuration);
    //如果允许短路读操作
    if (conf.shortCircuitLocalReads && allowShortCircuitLocalReads) {
      //判断是否支持老版本(HDFS-2246)的短路读，这种情况是通过RPC从datanode上获取文件路径，然后客户端直接通过该文件路径读取数据，不过由于这种方式可以浏览文件所有数据,所以是不太安全的。
      if (clientContext.getUseLegacyBlockReaderLocal()) {
    	//获取BlockReaderLocalLegacy类对象
        reader = getLegacyBlockReaderLocal();
        if (reader != null) {
          if (LOG.isTraceEnabled()) {
            LOG.trace(this + ": returning new legacy block reader local.");
          }
          return reader;
        }
      } else {//如果不支持老版本的短路读,那么就进行新版(HDFS-347)的短路读
        reader = getBlockReaderLocal();
        if (reader != null) {
          if (LOG.isTraceEnabled()) {
            LOG.trace(this + ": returning new block reader local.");
          }
          return reader;
        }
      }
    }
    if (conf.domainSocketDataTraffic) {
      reader = getRemoteBlockReaderFromDomain();
      if (reader != null) {
        if (LOG.isTraceEnabled()) {
          LOG.trace(this + ": returning new remote block reader using " +
              "UNIX domain socket on " + pathInfo.getPath());
        }
        return reader;
      }
    }
    Preconditions.checkState(!DFSInputStream.tcpReadsDisabledForTesting,
        "TCP reads were disabled for testing, but we failed to " +
        "do a non-TCP read.");
    return getRemoteBlockReaderFromTcp();
  }

这个build函数会创建一个BlockReader接口实现类对象，相应的实现结构如下：

BlockReaderç±»

其中

1、getLegacyBlockReaderLocal函数返回的是BlockReaderLocalLegacy类对象；

2、getBlockReaderLocal函数返回的是BlockReaderLocal类对象；

3、getRemoteBlockReaderFromDomain函数和getRemoteBlockReaderFromTcp函数返回的是RemoteBlockReader2类对象；

其中第一个和第二个获取到的对象是针对短路读的，只不过第一个是比较老的版本，现在已经被废弃了。所谓的短路读就是客户端和datanode在同一台服务器上，此时读取数据就没必要走网络，而是直接进行数据读取操作。关于这个:

老版本(HDFS-2246)的做法是

通过RPC获取datanode上的数据文件和对应的校验文件绝对路径，然后客户端通过两个文件的绝对路径直接进行文件的读取操作。

新版本(HDFS-347)的做法是

通过UNIX Domain Socket进程间通信方式，它使得同一台机器上的两个进程能以Socket的方式通信，并且还可以在进程间传递文件描述符。通过domain socket从datanode进程中将数据文件和对应的校验文件描述符传递到客户端进程中，然后客户端就可以通过文件描述符进行相应的数据读取操作了。

老版本的缺点是由于把文件的绝对路径提供给了客户端，这样就允许外界对datanode上的文件进行写操作，存在安全性问题，而新版本就不存在这个问题，通过domain socket可以传输只读的文件描述符给客户端，这样就可以禁止外界对文件的修改操作。

第三个分别是通过domain socket和tcp来读取数据文件

最终的读取文件操作都落到了BlockReaderLocalLegacy、BlockReaderLocal、RemoteBlockReader2实现的read函数上。这些read函数我们后面会分别进行分析。