Hadoop源码分析----Client的open、seek和read操作

最新推荐文章于 2022-06-19 13:47:09 发布

IT冲浪者

最新推荐文章于 2022-06-19 13:47:09 发布

阅读量1k

点赞数

分类专栏：云计算 BigData 文章标签： hadoop集群 hbase 性能优化

云计算同时被 2 个专栏收录

36 篇文章 0 订阅

订阅专栏

BigData

36 篇文章 1 订阅

订阅专栏

hadoop虽然没有提供POSIX那样的操作，但是提供的基本的文件操作open，create，delete，write，seek，read还是令用户可以方便的操作文件。下面是一段寻常的hadoop打开文件并且读取文件内容的代码：

[java]view plaincopyprint? 
   
 hdfs=hdfsPath.getFileSystem(conf);  
 inFsData=hdfs.open(p);  
 inFsData.seek(place);  
 inFsData.readLong();  

hdfs是FileSystem的实例，FileSystem是一个抽象类，根据conf中url的内容，返回的hdfs可能是本地文件系统的实例，也可能是分布式文件系统的实例。hadoop文件操作的实际类是DistributedFileSystem

下面来看一下DistributedFileSystem的open操作：

[java]view plaincopyprint? 
   
 public FSDataInputStream open(Path f, int bufferSize) throws IOException {  
   statistics.incrementReadOps(1);  
   return new DFSClient.DFSDataInputStream(  
         dfs.open(getPathName(f), bufferSize, verifyChecksum, statistics));  
 }  

可以看出open操作是返回一个FSDataInputStream的输入流，open里面生成了DFSClient中内部类DFSDataInputStream的对象，对象的其中参数是DFSClent的open函数返回值下面是DFSClient的open函数

[java]view plaincopyprint? 
   
 public DFSInputStream open(String src, int buffersize, boolean verifyChecksum,  
                     FileSystem.Statistics stats  
     ) throws IOException {  
   checkOpen();  
   //    Get block info from namenode  
   return new DFSInputStream(src, buffersize, verifyChecksum);  
 }  

这个open函数返回的是DFSInputStream对象，下面是DFSInputStream的构造函数：

[java]view plaincopyprint? 
   
 DFSInputStream(String src, int buffersize, boolean verifyChecksum  
                ) throws IOException {  
   this.verifyChecksum = verifyChecksum;  
   this.buffersize = buffersize;  
   this.src = src;  
   prefetchSize = conf.getLong("dfs.read.prefetch.size", prefetchSize);  
   openInfo();  
 }  

下面是DFSInputStream的openInfo函数，这个函数式整个open系列的核心操作。

[java]view plaincopyprint? 
   
 synchronized void openInfo() throws IOException {  
      LocatedBlocks newInfo = callGetBlockLocations(namenode, src, 0, prefetchSize);  
      if (newInfo == null) {  
        throw new FileNotFoundException("File does not exist: " + src);  
      }  
   
      // I think this check is not correct. A file could have been appended to  
      // between two calls to openInfo().  
      if (locatedBlocks != null && !locatedBlocks.isUnderConstruction() &&  
          !newInfo.isUnderConstruction()) {  
        Iterator<LocatedBlock> oldIter = locatedBlocks.getLocatedBlocks().iterator();  
        Iterator<LocatedBlock> newIter = newInfo.getLocatedBlocks().iterator();  
        while (oldIter.hasNext() && newIter.hasNext()) {  
          if (! oldIter.next().getBlock().equals(newIter.next().getBlock())) {  
            throw new IOException("Blocklist for " + src + " has changed!");  
          }  
        }  
      }  
      updateBlockInfo(newInfo);  
      this.locatedBlocks = newInfo;  
      this.currentNode = null;  
    }  

其中callGetBlockLocations是通过RPC和namenode通信来访问该文件的前prefetchSize个块（配置文件里的，默认为10）。把这10个块的位置存放在这个流中。后面有一个updateBlockInfo函数是选最后一块的datanode的信息与namenode上的信息做比较，若不一致，则遵从datanode上的信息（因为namenode和datanode上的信息可能存在不一致）。

然后的seek和read函数都是针对于stream的。下面看下DFSInputStream的seek函数

[java]view plaincopyprint? 
   
 public synchronized void seek(long targetPos) throws IOException {  
      if (targetPos > getFileLength()) {  
        throw new IOException("Cannot seek after EOF");  
      }  
      boolean done = false;  
      if (pos <= targetPos && targetPos <= blockEnd) {  
        //  
        // If this seek is to a positive position in the current  
        // block, and this piece of data might already be lying in  
        // the TCP buffer, then just eat up the intervening data.  
        //  
        int diff = (int)(targetPos - pos);  
        if (diff <= TCP_WINDOW_SIZE) {  
          try {  
            pos += blockReader.skip(diff);  
            if (pos == targetPos) {  
              done = true;  
            }  
          } catch (IOException e) {//make following read to retry  
            LOG.debug("Exception while seek to " + targetPos + " from "  
                      + currentBlock +" of " + src + " from " + currentNode +   
                      ": " + StringUtils.stringifyException(e));  
          }  
        }  
      }  
      if (!done) {  
        pos = targetPos;  
        blockEnd = -1;  
      }  
    }