HDFS副本放置策略源码分析

最新推荐文章于 2024-06-07 21:34:39 发布

极伪

最新推荐文章于 2024-06-07 21:34:39 发布

阅读量615

点赞数 2

文章标签：大数据 hadoop hdfs 源码

本文链接：https://blog.csdn.net/weixin_42473019/article/details/113431662

版权

本文分析了Hadoop HDFS的副本存放策略，包括1st到4th副本的选择逻辑。当写入block时遇到问题，发现集群虽满载但未达到无法存储的程度。通过源码研究，详细解释了`BlockManager`和`BlockPlacementPolicyDefault`中的`chooseTarget`方法，探讨了副本选择的降级处理过程，以及选择节点的各种条件，如存储类型匹配、容量要求、节点状态等。此外，还讨论了`scheduledSize`的计算和pipeline的构建，确保副本间距离最小化。

摘要由CSDN通过智能技术生成

背景

前段时间我们的集群在写入block时有如下报错：

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
File /tmp/xxxx.tmp could only be replicated to 0 nodes instead of minReplication (=1).  
There are 983 datanode(s) running and 983 node(s) are excluded in this operation.

虽然集群容量确实使用到了百分之九十多，但也不至于近千个节点全部都满了，所以看了一下相关代码，也总结了一下Hadoop的副本存放策略，我们这个集群使用的hadoop版本是2.6

HDFS副本存放策略

1st replica 如果写请求client所在机器是其中一个datanode,则直接存放在本地,否则随机在集群中选择一个datanode.
2nd replica 第二个副本存放于不同第一个副本的所在的机架.
3rd replica 第三个副本存放于第二个副本所在的机架,但是属于不同的节点
4rd replica 第四个副本或更多副本随机选择datanode节点进行存储

在这里插入图片描述

源码分析

BlockManager.java
可以看到以上的报错就是下面这个方法，接下来看chooseTarget方法

/**
   * Choose target datanodes for creating a new block.
   * 
   * @throws IOException
   *           if the number of targets < minimum replication.
   * @see BlockPlacementPolicy#chooseTarget(String, int, Node,
   *      Set, long, List, BlockStoragePolicy)
   */
  public DatanodeStorageInfo[] chooseTarget4NewBlock(final String src,
      final int numOfReplicas, final Node client,
      final Set<Node> excludedNodes,
      final long blocksize,
      final List<String> favoredNodes,
      final byte storagePolicyID) throws IOException {
   
    List<DatanodeDescriptor> favoredDatanodeDescriptors = 
        getDatanodeDescriptors(favoredNodes);
    final BlockStoragePolicy storagePolicy = storagePolicySuite.getPolicy(storagePolicyID);
    // 调用blockplacement的chooseTarget方法
    final DatanodeStorageInfo[] targets = blockplacement.chooseTarget(src,
        numOfReplicas, client, excludedNodes, blocksize, 
        favoredDatanodeDescriptors, storagePolicy);
    // 选择的目标节点数量不足,则会抛出IO异常
    if (targets.length < minReplication) {
   
      throw new IOException("File " + src + " could only be replicated to "
          + targets.length + " nodes instead of minReplication (="
          + minReplication + ").  There are "
          + getDatanodeManager().getNetworkTopology().getNumOfLeaves()
          + " datanode(s) running and "
          + (excludedNodes == null? "no": excludedNodes.size())
          + " node(s) are excluded in this operation.");
    }
    return targets;
  }

BlockPlacementPolicyDefault.java

  /** This is the implementation. */
  private DatanodeStorageInfo[] chooseTarget(int numOfReplicas,
                                    Node writer,
                                    List<DatanodeStorageInfo> chosenStorage,
                                    boolean returnChosenNodes,
                                    Set<Node> excludedNodes,
                                    long blocksize,
                                    final BlockStoragePolicy storagePolicy) {
   
    // 副本数为0或datanode数量为0,返回空数组
    if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) {
   
      return DatanodeStorageInfo.EMPTY_ARRAY;
    }
    // 初始化排除节点列表
    if (excludedNodes == null) {
   
      excludedNodes = new HashSet<Node>();
    }
    // 计算每个机架允许分配的最大副本数
    int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas);
    numOfReplicas = result[0];
    int maxNodesPerRack = result[1];
    // 初始化结果节点列表
    final List<DatanodeStorageInfo> results = new ArrayList<DatanodeStorageInfo>(chosenStorage);
    for (DatanodeStorageInfo storage : chosenStorage) {
   
      // add localMachine and related nodes to excludedNodes
      addToExcludedNodes(storage.getDatanodeDescriptor(), excludedNodes);
    }

    boolean avoidStaleNodes = (stats != null
        && stats.isAvoidingStaleDataNodesForWrite());
    // 调用chooseTarget方法选择节点
    final Node localNode = chooseTarget(numOfReplicas, writer, excludedNodes,
        blocksize, maxNodesPerRack, results, avoidStaleNodes, storagePolicy,
        EnumSet.noneOf(StorageType.class), results.isEmpty());
    if (!returnChosenNodes) {
     
      results.removeAll(chosenStorage);
    }
      
    // sorting nodes to form a pipeline
    return getPipeline(
        (writer != null && writer instanceof DatanodeDescriptor) ? writer
            : localNode,
        results.toArray(new DatanodeStorageInfo[results.size()]));
  }

getMaxNodesPerRack方法（Calculate the maximum number of replicas to allocate per rack.）

  private int[] getMaxNodesPerRack(int numOfChosen, int numOfReplicas) {
   
    int clusterSize = clusterMap.getNumOfLeaves();
    int totalNumOfReplicas = numOfChosen + numOfReplicas;
    if (totalNumOfReplicas > clusterSize) {
   
      numOfReplicas -= (totalNumOfReplicas-clusterSize);
      totalNumOfReplicas = clusterSize;
    }
    // No calculation needed when there is only one rack or picking one node.
    int numOfRacks = clusterMap.getNumOfRacks();
    if (numOfRacks == 1 || totalNumOfReplicas <= 1) {

最低0.47元/天解锁文章

极伪

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
HDFS副本放置策略源码分析

背景前段时间我们的集群在写入block时有如下报错：Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/xxxx.tmp could only be replicated to 0 nodes instead of minReplication (=1). There are 983 dat
复制链接

扫一扫