Hadoop源码分析之读文件时NameNode和DataNode的处理过程

最新推荐文章于 2022-12-10 11:00:00 发布

我的十六亩三分地

最新推荐文章于 2022-12-10 11:00:00 发布

阅读量2.9k

点赞数

分类专栏： Hadoop

本文链接：https://blog.csdn.net/workformywork/article/details/21783861

版权

本文详细分析了Hadoop中客户端从NameNode获取文件数据块位置，以及从DataNode读取数据块内容的过程。首先，客户端通过NameNode的getBlockLocations()方法获取LocatedBlocks对象，了解数据块所在节点信息。接着，NameNode通过FSNamesystem层层调用，定位文件数据块并进行排序。然后，客户端根据数据块信息联系DataNode，通过DataXceiverServer和DataXceiver类处理读请求，发送数据给客户端。整个流程包括数据块的定位、传输策略以及数据传输的高效实现，如零拷贝技术的应用。

摘要由CSDN通过智能技术生成

从NameNode节点获取数据块所在节点等信息

客户端在和数据节点建立流式接口的TCP连接，读取文件数据前需要定位数据的位置，所以首先客户端在DFSClient.callGetBlockLocations()方法中调用了远程方法ClientProtocol.getBlockLocations()，调用该方法返回一个LocatedBlocks对象，包含了一系列的LocatedBlock实例，通过这些信息客户端就知道需要到哪些数据节点上去获取数据。这个方法会在NameNode.getBlockLocations()中调用，进而调用FSNamesystem.同名的来进行实际的调用过程，FSNamesystem有三个重载方法，代码如下：

LocatedBlocks getBlockLocations(String clientMachine, String src,
      long offset, long length) throws IOException {
   
    LocatedBlocks blocks = getBlockLocations(src, offset, length, true, true,
        true);
    if (blocks != null) {
   //如果blocks不为空，那么就对数据块所在的数据节点进行排序
      //sort the blocks
      // In some deployment cases, cluster is with separation of task tracker 
      // and datanode which means client machines will not always be recognized 
      // as known data nodes, so here we should try to get node (but not 
      // datanode only) for locality based sort.
      Node client = host2DataNodeMap.getDatanodeByHost(
          clientMachine);
      if (client == null) {
   
        List<String> hosts = new ArrayList<String> (1);
        hosts.add(clientMachine);
        String rName = dnsToSwitchMapping.resolve(hosts).get(0);
        if (rName != null)
          client = new NodeBase(clientMachine, rName);
      }   

      DFSUtil.StaleComparator comparator = null;
      if (avoidStaleDataNodesForRead) {
   
        comparator = new DFSUtil.StaleComparator(staleInterval);
      }
      // Note: the last block is also included and sorted
      for (LocatedBlock b : blocks.getLocatedBlocks()) {
   
        clusterMap.pseudoSortByDistance(client, b.getLocations());
        if (avoidStaleDataNodesForRead) {
   
          Arrays.sort(b.getLocations(), comparator);
        }
      }
    }
    return blocks;
  }

  /**
   * Get block locations within the specified range.
   * @see ClientProtocol#getBlockLocations(String, long, long)
   */
  public LocatedBlocks getBlockLocations(String src, long offset, long length
      ) throws IOException {
   
    return getBlockLocations(src, offset, length, false, true, true);
  }

  /**
   * Get block locations within the specified range.
   * @see ClientProtocol#getBlockLocations(String, long, long)
   */
  public LocatedBlocks getBlockLocations(String src, long offset, long length,
      boolean doAccessTime, boolean needBlockToken, boolean checkSafeMode)
      throws IOException {
   
    if (isPermissionEnabled) {
   //读权限检查
      FSPermissionChecker pc = getPermissionChecker();
      checkPathAccess(pc, src, FsAction.READ);
    }

    if (offset < 0) {
   
      throw new IOException("Negative offset is not supported. File: " + src );
    }
    if (length < 0) {
   
      throw new IOException("Negative length is not supported. File: " + src );
    }
    final LocatedBlocks ret = getBlockLocationsInternal(src, 
        offset, length, Integer.MAX_VALUE, doAccessTime, needBlockToken);  
    if (auditLog.isInfoEnabled() && isExternalInvocation()) {
   
      logAuditEvent(UserGroupInformation.getCurrentUser(),
                    Server.getRemoteIp(),
                    "open", src, null, null);
    }
    if (checkSafeMode && isInSafeMode()) {
   
      for (LocatedBlock b : ret.getLocatedBlocks()) {
   
        // if safemode & no block locations yet then throw safemodeException
        if ((b.getLocations() == null) || (b.getLocations().length == 0)) {
   
          throw new SafeModeException("Zero blocklocations for " + src,
              safeMode);
        }
      }
    }
    return ret;
  }

从上面的代码可以看出，前两个方法都是调用了第三个重载方法，第二个方法获取到数据块之后，还会根据客户端和获取到的节点列表进行”排序”，“排序”调用的方法是：

public void pseudoSortByDistance( Node reader, Node[] nodes ) {
   
    int tempIndex = 0;
    if (reader != null ) {
   
      int localRackNode = -1;
      //scan the array to find the local node & local rack node
      for(int i=0; i<nodes.length; i++) {
   //遍历nodes，看reader是否在nodes中
        if(tempIndex == 0 && reader == nodes[i]) {
    //local node
          //swap the local node and the node at position 0
            //第i个数据节点与客户端是一台机器
          if( i != 0 ) {
   
            swap(nodes, tempIndex, i);
          }
          tempIndex=1;
          if(localRackNode != -1 ) {
   
            if(localRackNode == 0) {
   //localRackNode==0表示在没有交换之前，第0个节点是
                //与reader位于同一机架上的节点，现在交换了，那么第i个就是与reader在同一机架上的节点
              localRackNode = i;
            }
            break;//第0个是reader节点，第i个是与reader在同一机架上的节点，那么剩下的节点就一定在这个机架上，跳出循环
          }
        } else if(localRackNode == -1 && isOnSameRack(reader, nodes[i])) {
   
          //local rack，节点i和Reader在同一个机架上
          localRackNode = i;
          if(tempIndex != 0 ) break;//tempIndex ！= 0表示reader在nodes中
        }
      }
      //如果reader在nodes中，那么tempIndex==1，否则tempIndex = 0，如果localRackNode ！= 1，那么localRackNode节点就
      //是与reader位于同一机架上的节点，交换localRackNode到tempIndex，这样如果reader在nodes中，localRackNode与reader
      //在同一个机架上，那么第0个就是reader节点，第1个就是localRackNode节点，如果reader不在nodes中，
      //localRackNode与reader在同一个机架上，那么第0个就是localRackNode节点，否则就随机找一个
      if(localRackNode != -1 && localRackNode != tempIndex ) {
   
        swap(nodes, tempIndex, localRackNode);
        tempIndex++;
      }
    }
    //tempIndex == 0，则在nodes中既没有reader，也没有与reader在同一机架上的节点
    if(tempIndex == 0 && nodes.length != 0) {
   
      swap(nodes, 0, r.nextInt(nodes.length));
    }
  }

“排序”的规则是如果reader节点在nodes节点列表中，那么将reader放在nodes的第0个位置，如果在nodes中有与reader在同一机架上的节点localRackNode，那么就将localRackNode节点放在reader后面（如果reader不在nodes中，可以将reader视作在nodes的第-1个位置），如果也不存在与reader在同一机架上的节点，那么就在nodes中随机选择一个节点放在第0个位置。
在FSNamesystem.getBlockLocations()的第三个重载方法中，调用了FSNamesystem.getBlockLocationsInternal()方法来具体处理充NameNode节点的目录树中到文件所对应的数据块，这个方法代码如下：

private synchronized LocatedBlocks getBlockLocationsInternal(String src,
                                                       long offset, 
                                                       long length,
                                                       int nrBlocksToReturn,
                                                       boolean doAccessTime, 
                                                       boolean needBlockToken)
                                                       throws IOException {
   
      //获取src路径上最后一个节点即文件节点
    INodeFile inode = dir.getFileINode(src);
    if(inode == null) {
   
      return null;
    }
    if (doAccessTime && isAccessTimeSupported()) {
   
        //修改最后访问时间
      dir.setTimes(src, inode, -1, now(), false);
    }
    //返回文件的数据块
    Block[] blocks = inode.getBlocks();
    if (blocks == null) {
   
      r