HDFS卷(磁盘)选择策略

在我们目前使用的Hadoop 2.x版本当中,HDFS在写入时有两种选择卷(磁盘)的策略,一是基于轮询的策略(RoundRobinVolumeChoosingPolicy),二是基于可用空间的策略(AvailableSpaceVolumeChoosingPolicy)。

基于轮询的策略

“轮询”是一个在操作系统理论中常见的概念,比如进程调度算法中的轮询算法。其思想就是从对象1遍历到对象n,然后再从1开始。HDFS中轮询策略的源码如下,非常好理解。

在这里插入图片描述

public class RoundRobinVolumeChoosingPolicy<V extends FsVolumeSpi>
    implements VolumeChoosingPolicy<V> {
  public static final Log LOG = LogFactory.getLog(RoundRobinVolumeChoosingPolicy.class);
 
  private int curVolume = 0;
 
  @Override
  public synchronized V chooseVolume(final List<V> volumes, long blockSize)
      throws IOException {
 
    if(volumes.size() < 1) {
      throw new DiskOutOfSpaceException("No more available volumes");
    }
    
    // since volumes could've been removed because of the failure
    // make sure we are not out of bounds
    if(curVolume >= volumes.size()) {
      curVolume = 0;
    }
    
    int startVolume = curVolume;
    long maxAvailable = 0;
    
    while (true) {
      final V volume = volumes.get(curVolume);
      curVolume = (curVolume + 1) % volumes.size();
      long availableVolumeSize = volume.getAvailable();
      if (availableVolumeSize > blockSize) {
        return volume;
      }
      
      if (availableVolumeSize > maxAvailable) {
        maxAvailable = availableVolumeSize;
      }
      
      if (curVolume == startVolume) {
        throw new DiskOutOfSpaceException("Out of space: "
            + "The volume with the most available space (=" + maxAvailable
            + " B) is less than the block size (=" + blockSize + " B).");
      }
    }
  }
}

基于轮询的策略可以保证每个卷的写入次数平衡,但无法保证写入数据量平衡。例如,在一次写过程中,在卷A上写入了1M的块,但在卷B上写入了128M的块,A与B之间的数据量就不平衡了。久而久之,不平衡的现象就会越发严重。

基于可用空间的策略
这个策略比轮询更加聪明一些。它根据一个可用空间的阈值,将卷分为可用空间多的卷和可用空间少的卷两类。然后,会根据一个比较高的概率选择可用空间多的卷。不管选择了哪一类,最终都会采用轮询策略来写入这一类卷。可用空间阈值和选择卷的概率都是可以通过参数设定的。

其源码如下。

public class AvailableSpaceVolumeChoosingPolicy<V extends FsVolumeSpi>
    implements VolumeChoosingPolicy<V>, Configurable {
  
  private static final Log LOG = LogFactory.getLog(AvailableSpaceVolumeChoosingPolicy.class);
  
  private final Random random;
  
  private long balancedSpaceThreshold = DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_THRESHOLD_DEFAULT;
  private float balancedPreferencePercent = DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_PREFERENCE_FRACTION_DEFAULT;
 
  AvailableSpaceVolumeChoosingPolicy(Random random) {
    this.random = random;
  }
 
  public AvailableSpaceVolumeChoosingPolicy() {
    this(new Random());
  }
 
  @Override
  public synchronized void setConf(Configuration conf) {
    balancedSpaceThreshold = conf.getLong(
        DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_THRESHOLD_KEY,
        DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_THRESHOLD_DEFAULT);
    balancedPreferencePercent = conf.getFloat(
        DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_PREFERENCE_FRACTION_KEY,
        DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_PREFERENCE_FRACTION_DEFAULT);
    
    LOG.info("Available space volume choosing policy initialized: " +
        DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_THRESHOLD_KEY +
        " = " + balancedSpaceThreshold + ", " +
        DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_PREFERENCE_FRACTION_KEY +
        " = " + balancedPreferencePercent);
 
    if (balancedPreferencePercent > 1.0) {
      LOG.warn("The value of " + DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_PREFERENCE_FRACTION_KEY +
               " is greater than 1.0 but should be in the range 0.0 - 1.0");
    }
 
    if (balancedPreferencePercent < 0.5) {
      LOG.warn("The value of " + DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_PREFERENCE_FRACTION_KEY +
               " is less than 0.5 so volumes with less available disk space will receive more block allocations");
    }
  }
  
  @Override
  public synchronized Configuration getConf() {
    // Nothing to do. Only added to fulfill the Configurable contract.
    return null;
  }
  // 已平衡的卷的轮询策略
  private final VolumeChoosingPolicy<V> roundRobinPolicyBalanced =
      new RoundRobinVolumeChoosingPolicy<V>();
  // 可用空间多的卷的轮询策略
  private final VolumeChoosingPolicy<V> roundRobinPolicyHighAvailable =
      new RoundRobinVolumeChoosingPolicy<V>();
  // 可用空间少的卷的轮询策略
  private final VolumeChoosingPolicy<V> roundRobinPolicyLowAvailable =
      new RoundRobinVolumeChoosingPolicy<V>();
 
  @Override
  public synchronized V chooseVolume(List<V> volumes,
      long replicaSize) throws IOException {
    if (volumes.size() < 1) {
      throw new DiskOutOfSpaceException("No more available volumes");
    }
    
    AvailableSpaceVolumeList volumesWithSpaces =
        new AvailableSpaceVolumeList(volumes);
    // 如果卷都在平衡阈值之内,直接轮询
    if (volumesWithSpaces.areAllVolumesWithinFreeSpaceThreshold()) {
      // If they're actually not too far out of whack, fall back on pure round
      // robin.
      V volume = roundRobinPolicyBalanced.chooseVolume(volumes, replicaSize);
      if (LOG.isDebugEnabled()) {
        LOG.debug("All volumes are within the configured free space balance " +
            "threshold. Selecting " + volume + " for write of block size " +
            replicaSize);
      }
      return volume;
    } else {
      V volume = null;
      // If none of the volumes with low free space have enough space for the
      // replica, always try to choose a volume with a lot of free space.
      long mostAvailableAmongLowVolumes = volumesWithSpaces
          .getMostAvailableSpaceAmongVolumesWithLowAvailableSpace();
      // 分别获取可用空间多和少的卷列表
      List<V> highAvailableVolumes = extractVolumesFromPairs(
          volumesWithSpaces.getVolumesWithHighAvailableSpace());
      List<V> lowAvailableVolumes = extractVolumesFromPairs(
          volumesWithSpaces.getVolumesWithLowAvailableSpace());
      
      float preferencePercentScaler =
          (highAvailableVolumes.size() * balancedPreferencePercent) +
          (lowAvailableVolumes.size() * (1 - balancedPreferencePercent));
      // 计算平衡比值,balancedPreferencePercent越大,可用空间多的卷所占比重会变大
      float scaledPreferencePercent =
          (highAvailableVolumes.size() * balancedPreferencePercent) /
          preferencePercentScaler;
      // 如果可用空间少的卷不足以放得下副本,或者随机出来的概率比上面的比例小,就轮询可用空间多的卷
      if (mostAvailableAmongLowVolumes < replicaSize ||
          random.nextFloat() < scaledPreferencePercent) {
        volume = roundRobinPolicyHighAvailable.chooseVolume(
            highAvailableVolumes, replicaSize);
        if (LOG.isDebugEnabled()) {
          LOG.debug("Volumes are imbalanced. Selecting " + volume +
              " from high available space volumes for write of block size "
              + replicaSize);
        }
      } else {
        volume = roundRobinPolicyLowAvailable.chooseVolume(
            lowAvailableVolumes, replicaSize);
        if (LOG.isDebugEnabled()) {
          LOG.debug("Volumes are imbalanced. Selecting " + volume +
              " from low available space volumes for write of block size "
              + replicaSize);
        }
      }
      return volume;
    }
  }
}

这个策略可以在一定程度上削弱不平衡的现象,但仍然无法完全消除其影响。并且卷的可用空间只是诸多因素中的一个,仍然不够全面,磁盘I/O等指标也是比较重要的。但不管如何,它已经比纯轮询策略好得太多了。

修改卷选择策略

由hdfs-site.xml中的dfs.datanode.fsdataset.volume.choosing.policy属性来指定。可取的值为org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy或AvailableSpaceVolumeChoosingPolicy。
选择基于可用空间的策略,还有两个属性需要注意。

  • dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold
    默认值10737418240,即10G。它的含义是所有卷中最大可用空间与最小可用空间差值的阈值,如果小于这个阈值,则认为存储是平衡的,直接采用轮询来选择卷。
  • dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction
    默认值0.75。它的含义是数据块存储到可用空间多的卷上的概率,由此可见,这个值如果取0.5以下,对该策略而言是毫无意义的,一般就采用默认值。
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值