2020-09-09

HDFS在写入数据时有两种选择磁盘策略:

  • 基于轮询的策略(RoundRobinVolumeChoosingPolicy)
  • 基于可用空间的策略(AvailableSpaceVolumeChoosingPolicy)
1. 基于轮询的策略

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-aeYmLnCM-1599615000844)(/images/23/599/c6d3bf7d455c36b9cb4c96ebad2a12c.png)]
轮询策略的思想就是从对象1遍历到对象n,然后再从1开始。HDFS中轮询策略的源码如下:

public class RoundRobinVolumeChoosingPolicy<V extends FsVolumeSpi>
    implements VolumeChoosingPolicy<V> {
  public static final Log LOG = LogFactory.getLog(RoundRobinVolumeChoosingPolicy.class);

  private int curVolume = 0;

  @Override
  public synchronized V chooseVolume(final List<V> volumes, long blockSize)
      throws IOException {

    if(volumes.size() < 1) {
      throw new DiskOutOfSpaceException("No more available volumes");
    }
    // since volumes could've been removed because of the failure
    // make sure we are not out of bounds
    if(curVolume >= volumes.size()) {
      curVolume = 0;
    }
    int startVolume = curVolume;
    long maxAvailable = 0;
    while (true) {
      final V volume = volumes.get(curVolume);
      curVolume = (curVolume + 1) % volumes.size();
      long availableVolumeSize = volume.getAvailable();
      if (availableVolumeSize > blockSize) {
        return volume;
      }
      if (availableVolumeSize > maxAvailable) {
        maxAvailable = availableVolumeSize;
      }
      if (curVolume == startVolume) {
        throw new DiskOutOfSpaceException("Out of space: "
            + "The volume with the most available space (=" + maxAvailable
            + " B) is less than the block size (=" + blockSize + " B).");
      }
    }
  }
}

基于轮询的策略可以保证每个卷的写入次数平衡,但无法保证写入数据量平衡。例如,在一次写过程中,在卷A上写入了1M的块,但在卷B上写入了128M的块,A与B之间的数据量就不平衡了。久而久之,不平衡的现象就会越发严重。

2. 基于可用空间的策略

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hDU1H1Gn-1599615000848)(/images/23/599/clipboard.png)]
这个策略比轮询更加聪明一些。它根据一个可用空间的阈值,将卷分为可用空间多的卷和可用空间少的卷两类。然后,会根据一个比较高的概率选择可用空间多的卷。不管选择了哪一类,最终都会采用轮询策略来写入这一类卷。可用空间阈值和选择卷的概率都是可以通过参数设定的。
其源码如下:

@Override
public Configuration getConf() {
  // Nothing to do. Only added to fulfill the Configurable contract.
  return null;
}

// 已平衡的卷的轮询策略
private final VolumeChoosingPolicy<V> roundRobinPolicyBalanced =
    new RoundRobinVolumeChoosingPolicy<V>();
// 可用空间多的卷的轮询策略
private final VolumeChoosingPolicy<V> roundRobinPolicyHighAvailable =
    new RoundRobinVolumeChoosingPolicy<V>();
// 可用空间少的卷的轮询策略
private final VolumeChoosingPolicy<V> roundRobinPolicyLowAvailable =
    new RoundRobinVolumeChoosingPolicy<V>();

@Override
public V chooseVolume(List<V> volumes, long replicaSize, String storageId)
    throws IOException {
  if (volumes.size() < 1) {
    throw new DiskOutOfSpaceException("No more available volumes");
  }
  // As all the items in volumes are with the same storage type,
  // so only need to get the storage type index of the first item in volumes
  StorageType storageType = volumes.get(0).getStorageType();
  int index = storageType != null ?
          storageType.ordinal() : StorageType.DEFAULT.ordinal();

  synchronized (syncLocks[index]) {
    return doChooseVolume(volumes, replicaSize, storageId);
  }
}

private V doChooseVolume(final List<V> volumes, long replicaSize,
    String storageId) throws IOException {
  AvailableSpaceVolumeList volumesWithSpaces =
      new AvailableSpaceVolumeList(volumes);
  // 如果卷都在平衡阈值之内,直接轮询
  if (volumesWithSpaces.areAllVolumesWithinFreeSpaceThreshold()) {
    // If they're actually not too far out of whack, fall back on pure round
    // robin.
    V volume = roundRobinPolicyBalanced.chooseVolume(volumes, replicaSize,
        storageId);
    if (LOG.isDebugEnabled()) {
      LOG.debug("All volumes are within the configured free space balance " +
          "threshold. Selecting " + volume + " for write of block size " +
          replicaSize);
    }
    return volume;
  } else {
    V volume = null;
    // If none of the volumes with low free space have enough space for the
    // replica, always try to choose a volume with a lot of free space.
    long mostAvailableAmongLowVolumes = volumesWithSpaces
        .getMostAvailableSpaceAmongVolumesWithLowAvailableSpace();
    // 分别获取可用空间多和少的卷列表
    List<V> highAvailableVolumes = extractVolumesFromPairs(
        volumesWithSpaces.getVolumesWithHighAvailableSpace());
    List<V> lowAvailableVolumes = extractVolumesFromPairs(
        volumesWithSpaces.getVolumesWithLowAvailableSpace());
    float preferencePercentScaler =
        (highAvailableVolumes.size() * balancedPreferencePercent) +
        (lowAvailableVolumes.size() * (1 - balancedPreferencePercent));
    // 计算平衡比值,balancedPreferencePercent越大,可用空间多的卷所占比重会变大
    float scaledPreferencePercent =
        (highAvailableVolumes.size() * balancedPreferencePercent) /
        preferencePercentScaler;
    // 如果可用空间少的卷不足以放得下副本,或者随机出来的概率比上面的比例小,就轮询可用空间多的卷
    if (mostAvailableAmongLowVolumes < replicaSize ||
        random.nextFloat() < scaledPreferencePercent) {
      volume = roundRobinPolicyHighAvailable.chooseVolume(
          highAvailableVolumes, replicaSize, storageId);
      if (LOG.isDebugEnabled()) {
        LOG.debug("Volumes are imbalanced. Selecting " + volume +
            " from high available space volumes for write of block size "
            + replicaSize);
      }
    } else {
      volume = roundRobinPolicyLowAvailable.chooseVolume(
          lowAvailableVolumes, replicaSize, storageId);
      if (LOG.isDebugEnabled()) {
        LOG.debug("Volumes are imbalanced. Selecting " + volume +
            " from low available space volumes for write of block size "
            + replicaSize);
      }
    }
    return volume;
  }
}

这个策略可以在一定程度上削弱不平衡的现象,但仍然无法完全消除其影响。并且卷的可用空间只是诸多因素中的一个,仍然不够全面,磁盘I/O等指标也是比较重要的。但不管如何,它已经比纯轮询策略好得多了。

3.修改卷选择策略

在hdfs-site.xml中的属性名 dfs.datanode.fsdataset.volume.choosing.policy 可取的值有:
org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy (默认轮询策略)
org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy (可用空间策略)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值