HDFS在写入数据时有两种选择磁盘策略:
- 基于轮询的策略(RoundRobinVolumeChoosingPolicy)
- 基于可用空间的策略(AvailableSpaceVolumeChoosingPolicy)
1. 基于轮询的策略
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-aeYmLnCM-1599615000844)(/images/23/599/c6d3bf7d455c36b9cb4c96ebad2a12c.png)]
轮询策略的思想就是从对象1遍历到对象n,然后再从1开始。HDFS中轮询策略的源码如下:
public class RoundRobinVolumeChoosingPolicy<V extends FsVolumeSpi>
implements VolumeChoosingPolicy<V> {
public static final Log LOG = LogFactory.getLog(RoundRobinVolumeChoosingPolicy.class);
private int curVolume = 0;
@Override
public synchronized V chooseVolume(final List<V> volumes, long blockSize)
throws IOException {
if(volumes.size() < 1) {
throw new DiskOutOfSpaceException("No more available volumes");
}
// since volumes could've been removed because of the failure
// make sure we are not out of bounds
if(curVolume >= volumes.size()) {
curVolume = 0;
}
int startVolume = curVolume;
long maxAvailable = 0;
while (true) {
final V volume = volumes.get(curVolume);
curVolume = (curVolume + 1) % volumes.size();
long availableVolumeSize = volume.getAvailable();
if (availableVolumeSize > blockSize) {
return volume;
}
if (availableVolumeSize > maxAvailable) {
maxAvailable = availableVolumeSize;
}
if (curVolume == startVolume) {
throw new DiskOutOfSpaceException("Out of space: "
+ "The volume with the most available space (=" + maxAvailable
+ " B) is less than the block size (=" + blockSize + " B).");
}
}
}
}
基于轮询的策略可以保证每个卷的写入次数平衡,但无法保证写入数据量平衡。例如,在一次写过程中,在卷A上写入了1M的块,但在卷B上写入了128M的块,A与B之间的数据量就不平衡了。久而久之,不平衡的现象就会越发严重。
2. 基于可用空间的策略
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hDU1H1Gn-1599615000848)(/images/23/599/clipboard.png)]
这个策略比轮询更加聪明一些。它根据一个可用空间的阈值,将卷分为可用空间多的卷和可用空间少的卷两类。然后,会根据一个比较高的概率选择可用空间多的卷。不管选择了哪一类,最终都会采用轮询策略来写入这一类卷。可用空间阈值和选择卷的概率都是可以通过参数设定的。
其源码如下:
@Override
public Configuration getConf() {
// Nothing to do. Only added to fulfill the Configurable contract.
return null;
}
// 已平衡的卷的轮询策略
private final VolumeChoosingPolicy<V> roundRobinPolicyBalanced =
new RoundRobinVolumeChoosingPolicy<V>();
// 可用空间多的卷的轮询策略
private final VolumeChoosingPolicy<V> roundRobinPolicyHighAvailable =
new RoundRobinVolumeChoosingPolicy<V>();
// 可用空间少的卷的轮询策略
private final VolumeChoosingPolicy<V> roundRobinPolicyLowAvailable =
new RoundRobinVolumeChoosingPolicy<V>();
@Override
public V chooseVolume(List<V> volumes, long replicaSize, String storageId)
throws IOException {
if (volumes.size() < 1) {
throw new DiskOutOfSpaceException("No more available volumes");
}
// As all the items in volumes are with the same storage type,
// so only need to get the storage type index of the first item in volumes
StorageType storageType = volumes.get(0).getStorageType();
int index = storageType != null ?
storageType.ordinal() : StorageType.DEFAULT.ordinal();
synchronized (syncLocks[index]) {
return doChooseVolume(volumes, replicaSize, storageId);
}
}
private V doChooseVolume(final List<V> volumes, long replicaSize,
String storageId) throws IOException {
AvailableSpaceVolumeList volumesWithSpaces =
new AvailableSpaceVolumeList(volumes);
// 如果卷都在平衡阈值之内,直接轮询
if (volumesWithSpaces.areAllVolumesWithinFreeSpaceThreshold()) {
// If they're actually not too far out of whack, fall back on pure round
// robin.
V volume = roundRobinPolicyBalanced.chooseVolume(volumes, replicaSize,
storageId);
if (LOG.isDebugEnabled()) {
LOG.debug("All volumes are within the configured free space balance " +
"threshold. Selecting " + volume + " for write of block size " +
replicaSize);
}
return volume;
} else {
V volume = null;
// If none of the volumes with low free space have enough space for the
// replica, always try to choose a volume with a lot of free space.
long mostAvailableAmongLowVolumes = volumesWithSpaces
.getMostAvailableSpaceAmongVolumesWithLowAvailableSpace();
// 分别获取可用空间多和少的卷列表
List<V> highAvailableVolumes = extractVolumesFromPairs(
volumesWithSpaces.getVolumesWithHighAvailableSpace());
List<V> lowAvailableVolumes = extractVolumesFromPairs(
volumesWithSpaces.getVolumesWithLowAvailableSpace());
float preferencePercentScaler =
(highAvailableVolumes.size() * balancedPreferencePercent) +
(lowAvailableVolumes.size() * (1 - balancedPreferencePercent));
// 计算平衡比值,balancedPreferencePercent越大,可用空间多的卷所占比重会变大
float scaledPreferencePercent =
(highAvailableVolumes.size() * balancedPreferencePercent) /
preferencePercentScaler;
// 如果可用空间少的卷不足以放得下副本,或者随机出来的概率比上面的比例小,就轮询可用空间多的卷
if (mostAvailableAmongLowVolumes < replicaSize ||
random.nextFloat() < scaledPreferencePercent) {
volume = roundRobinPolicyHighAvailable.chooseVolume(
highAvailableVolumes, replicaSize, storageId);
if (LOG.isDebugEnabled()) {
LOG.debug("Volumes are imbalanced. Selecting " + volume +
" from high available space volumes for write of block size "
+ replicaSize);
}
} else {
volume = roundRobinPolicyLowAvailable.chooseVolume(
lowAvailableVolumes, replicaSize, storageId);
if (LOG.isDebugEnabled()) {
LOG.debug("Volumes are imbalanced. Selecting " + volume +
" from low available space volumes for write of block size "
+ replicaSize);
}
}
return volume;
}
}
这个策略可以在一定程度上削弱不平衡的现象,但仍然无法完全消除其影响。并且卷的可用空间只是诸多因素中的一个,仍然不够全面,磁盘I/O等指标也是比较重要的。但不管如何,它已经比纯轮询策略好得多了。
3.修改卷选择策略
在hdfs-site.xml中的属性名 dfs.datanode.fsdataset.volume.choosing.policy 可取的值有:
org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy (默认轮询策略)
org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy (可用空间策略)