2020-09-09

最新推荐文章于 2024-04-23 16:52:13 发布

建康

最新推荐文章于 2024-04-23 16:52:13 发布

阅读量160

点赞数

分类专栏： Hadoop 文章标签： hadoop

本文链接：https://blog.csdn.net/qq_29989725/article/details/108482518

版权

Hadoop 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

HDFS在写入数据时有两种选择磁盘策略：

基于轮询的策略（RoundRobinVolumeChoosingPolicy）
基于可用空间的策略（AvailableSpaceVolumeChoosingPolicy）

1. 基于轮询的策略

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-aeYmLnCM-1599615000844)(/images/23/599/c6d3bf7d455c36b9cb4c96ebad2a12c.png)]
轮询策略的思想就是从对象1遍历到对象n，然后再从1开始。HDFS中轮询策略的源码如下：

public class RoundRobinVolumeChoosingPolicy<V extends FsVolumeSpi>
    implements VolumeChoosingPolicy<V> {
  public static final Log LOG = LogFactory.getLog(RoundRobinVolumeChoosingPolicy.class);

  private int curVolume = 0;

  @Override
  public synchronized V chooseVolume(final List<V> volumes, long blockSize)
      throws IOException {

    if(volumes.size() < 1) {
      throw new DiskOutOfSpaceException("No more available volumes");
    }
    // since volumes could've been removed because of the failure
    // make sure we are not out of bounds
    if(curVolume >= volumes.size()) {
      curVolume = 0;
    }
    int startVolume = curVolume;
    long maxAvailable = 0;
    while (true) {
      final V volume = volumes.get(curVolume);
      curVolume = (curVolume + 1) % volumes.size();
      long availableVolumeSize = volume.getAvailable();
      if (availableVolumeSize > blockSize) {
        return volume;
      }
      if (availableVolumeSize > maxAvailable) {
        maxAvailable = availableVolumeSize;
      }
      if (curVolume == startVolume) {
        throw new DiskOutOfSpaceException("Out of space: "
            + "The volume with the most available space (=" + maxAvailable
            + " B) is less than the block size (=" + blockSize + " B).");
      }
    }
  }
}

基于轮询的策略可以保证每个卷的写入次数平衡，但无法保证写入数据量平衡。例如，在一次写过程中，在卷A上写入了1M的块，但在卷B上写入了128M的块，A与B之间的数据量就不平衡了。久而久之，不平衡的现象就会越发严重。

2. 基于可用空间的策略

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hDU1H1Gn-1599615000848)(/images/23/599/clipboard.png)]
这个策略比轮询更加聪明一些。它根据一个可用空间的阈值，将卷分为可用空间多的卷和可用空间少的卷两类。然后，会根据一个比较高的概率选择可用空间多的卷。不管选择了哪一类，最终都会采用轮询策略来写入这一类卷。可用空间阈值和选择卷的概率都是可以通过参数设定的。
其源码如下：

@Override
public Configuration getConf() {
  // Nothing to do. Only added to fulfill the Configurable contract.
  return null;
}

// 已平衡的卷的轮询策略
private final VolumeChoosingPolicy<V> roundRobinPolicyBalanced =
    new RoundRobinVolumeChoosingPolicy<V>();
// 可用空间多的卷的轮询策略
private final VolumeChoosingPolicy<V> roundRobinPolicyHighAvailable =
    new RoundRobinVolumeChoosingPolicy<V>();
// 可用空间少的卷的轮询策略
private final VolumeChoosingPolicy<V> roundRobinPolicyLowAvailable =
    new RoundRobinVolumeChoosingPolicy<V>();

@Override
public V chooseVolume(List<V> volumes, long replicaSize, String storageId)
    throws IOException {
  if (volumes.size() < 1) {
    throw new DiskOutOfSpaceException("No more available volumes");
  }
  // As all the items in volumes are with the same storage type,
  // so only need to get the storage type index of the first item in volumes
  StorageType storageType = volumes.get(0).getStorageType();
  int index = storageType != null ?
          storageType.ordinal() : StorageType.DEFAULT.ordinal();

  synchronized (syncLocks[index]) {
    return doChooseVolume(volumes, replicaSize, storageId);
  }
}

private V doChooseVolume(final List<V> volumes, long replicaSize,
    String storageId) throws IOException {
  AvailableSpaceVolumeList volumesWithSpaces =
      new AvailableSpaceVolumeList(volumes);
  // 如果卷都在平衡阈值之内，直接轮询
  if (volumesWithSpaces.areAllVolumesWithinFreeSpaceThreshold()) {
    // If they're actually not too far out of whack, fall back on pure round
    // robin.
    V volume = roundRobinPolicyBalanced.chooseVolume(volumes, replicaSize,
        storageId);
    if (LOG.isDebugEnabled()) {
      LOG.debug("All volumes are within the configured free space balance " +
          "threshold. Selecting " + volume + " for write of block size " +
          replicaSize);
    }
    return volume;
  } else {
    V volume = null;
    // If none of the volumes with low free space have enough space for the
    // replica, always try to choose a volume with a lot of free space.
    long mostAvailableAmongLowVolumes = volumesWithSpaces
        .getMostAvailableSpaceAmongVolumesWithLowAvailableSpace();
    // 分别获取可用空间多和少的卷列表
    List<V> highAvailableVolumes = extractVolumesFromPairs(
        volumesWithSpaces.getVolumesWithHighAvailableSpace());
    List<V> lowAvailableVolumes = extractVolumesFromPairs(
        volumesWithSpaces.getVolumesWithLowAvailableSpace());
    float preferencePercentScaler =
        (highAvailableVolumes.size() * balancedPreferencePercent) +
        (lowAvailableVolumes.size() * (1 - balancedPreferencePercent));
    // 计算平衡比值，balancedPreferencePercent越大，可用空间多的卷所占比重会变大
    float scaledPreferencePercent =
        (highAvailableVolumes.size() * balancedPreferencePercent) /
        preferencePercentScaler;
    // 如果可用空间少的卷不足以放得下副本，或者随机出来的概率比上面的比例小，就轮询可用空间多的卷
    if (mostAvailableAmongLowVolumes < replicaSize ||
        random.nextFloat() < scaledPreferencePercent) {
      volume = roundRobinPolicyHighAvailable.chooseVolume(
          highAvailableVolumes, replicaSize, storageId);
      if (LOG.isDebugEnabled()) {
        LOG.debug("Volumes are imbalanced. Selecting " + volume +
            " from high available space volumes for write of block size "
            + replicaSize);
      }
    } else {
      volume = roundRobinPolicyLowAvailable.chooseVolume(
          lowAvailableVolumes, replicaSize, storageId);
      if (LOG.isDebugEnabled()) {
        LOG.debug("Volumes are imbalanced. Selecting " + volume +
            " from low available space volumes for write of block size "
            + replicaSize);
      }
    }
    return volume;
  }
}

这个策略可以在一定程度上削弱不平衡的现象，但仍然无法完全消除其影响。并且卷的可用空间只是诸多因素中的一个，仍然不够全面，磁盘I/O等指标也是比较重要的。但不管如何，它已经比纯轮询策略好得多了。

3.修改卷选择策略

在hdfs-site.xml中的属性名 dfs.datanode.fsdataset.volume.choosing.policy 可取的值有：
org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy （默认轮询策略）
org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy （可用空间策略）

建康

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
2020-09-09

HDFS在写入数据时有两种选择磁盘策略：基于轮询的策略（RoundRobinVolumeChoosingPolicy）基于可用空间的策略（AvailableSpaceVolumeChoosingPolicy）1. 基于轮询的策略[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-aeYmLnCM-1599615000844)(/images/23/599/c6d3bf7d455c36b9cb4c96ebad2a12c.png)]轮询策略的思想就是从对象1遍历到对象n，
复制链接

扫一扫