DRF
是一种通用的多资源
最大最小公平分配策略
(Max-Min Fairness Strategy),其核心思想是在多环境下一个用户的资源分配应该由用户的
主导份额
的资源决定。主导份额的资源是在所有已经分配给用户的多种资源中,占据最大份额的一种资源。简而言之,DRF试图最大化所有用户中最小的主导份额。
一、DRF计算方式
假设:系统资源CPU和Memory分别为9 Core和18GB,应用A每个计算任务请求资源为<1 CPU,4GB>资源;应用B每个计算任务请求资源为<3 CPU,1GB>。如何为这种情况构建一个公平分配策略?
二、DRF伪代码
∗ ∗ ∗ 算法 D R F 伪代码 ∗ ∗ ∗ ‾ ‾ ‾ ‾ 条件假设 R − 系统资源容量 C − 系统已分配资源的情况 s − 应用的主导资源 U i − 为应用 i 分配的资源 D i − 应用 i ( 主导份额最小的应用 ) 待执行任务所需资源 变量初始化 R = { r 1 , ⋯ , r m } 注释:系统资源容量 C = { c 1 , ⋯ , c m } 注释:系统已分配的资源,初始化为 0 s = { 0 , ⋯ , 0 } ⏞ n 注释:应用的主导资源,初始化为 0 U i = { u i 0 ⋯ , u i m } 注释:应用 i 分配到的资源,子项数和系统资源数相等 , 即 m 个选项 分配逻辑 i f ( C + D i ) ≤ R t h e n C = C + D i 注释:更新已分配资源向量 U i = U i + D i 注释:更新应用 i 分配资源向量 s i = m a x { u i j / r j } j = 1 m 注释:求应用 i 的主导资源 e l s e r e t u r n e n d i f \begin{aligned} & \overline{\underline{\overline{\underline{***\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad 算法DRF伪代码\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad ***}}}} \\ & \textbf{条件假设}\\ & \qquad R \ \ -系统资源容量 \\ & \qquad C \ \ -系统已分配资源的情况 \\ & \qquad s \ \ \: -应用的主导资源 \\ & \qquad U_i \, - 为应用i分配的资源 \\ & \qquad D_i \, - 应用i(主导份额最小的应用)待执行任务所需资源 \\ & \textbf{变量初始化} \\ & \qquad R \ \ = \{r_1, \cdots, r_m\} \qquad \qquad 注释:系统资源容量 \\ & \qquad C \ \ = \{c_1, \cdots, c_m\} \qquad \qquad 注释:系统已分配的资源,初始化为0 \\ & \qquad s \ \ = \overbrace{\{0, \cdots, 0\}}^{n} \qquad \qquad \ \ \ \ \ 注释:应用的主导资源,初始化为0 \\ & \qquad U_i = \{u_{i0}\cdots,u_{im}\} \qquad \ \ \ \ \ \ \ 注释:应用i分配到的资源,子项数和系统资源数相等,即m个选项 \\ & \textbf{分配逻辑} \\ & \qquad if \ (C+Di) \leq R \ then \\ & \qquad \qquad C= C + Di \qquad \qquad \ \ 注释:更新已分配资源向量 \\ & \qquad \qquad Ui= Ui + Di \qquad \ \ \ \ \ \ \ 注释:更新应用i分配资源向量 \\ & \qquad \qquad s_i = max\{u_{ij}/r_j\}^{m}_{j=1} \ \ \ 注释:求应用i的主导资源\\ & \qquad else \\ & \qquad \qquad return \\ & \qquad end \ if \end{aligned} ∗∗∗算法DRF伪代码∗∗∗条件假设R −系统资源容量C −系统已分配资源的情况s −应用的主导资源Ui−为应用i分配的资源Di−应用i(主导份额最小的应用)待执行任务所需资源变量初始化R ={r1,⋯,rm}注释:系统资源容量C ={c1,⋯,cm}注释:系统已分配的资源,初始化为0s ={0,⋯,0} n 注释:应用的主导资源,初始化为0Ui={ui0⋯,uim} 注释:应用i分配到的资源,子项数和系统资源数相等,即m个选项分配逻辑if (C+Di)≤R thenC=C+Di 注释:更新已分配资源向量Ui=Ui+Di 注释:更新应用i分配资源向量si=max{uij/rj}j=1m 注释:求应用i的主导资源elsereturnend if
三、算法分配案例
假设:系统资源CPU和Memory分别为9 Core和18GB,应用A每个计算任务请求资源为<1 CPU,4GB>资源;应用B每个计算任务请求资源为<3 CPU,1GB>。应用A 3个任务,任务B 3个任务。
调度顺序 | 应用A分配的资源 | 应用A主导资源 | 应用B分配的资源 | 应用B主导资源 | CPU | RAM |
---|---|---|---|---|---|---|
A | (1/9, 4/18) | 4/18 | (0,0) | 0 | 1/9 | 4/18 |
A | (2/9, 8/18) | 8/18 | (0,0) | 0 | 2/9 | 8/18 |
B | (2/9, 8/18) | 8/18 | (3/9,1/18) | 3/9 | 5/9 | 9/18 |
B | (2/9, 8/18) | 8/18 | (6/9,2/18) | 6/9 | 8/9 | 10/18 |
A | (3/9, 12/18) | 12/18 | (6,9) | 6/9 | 9/9 | 14/18 |
四、Yarn源码
Hadoop 2.0 YARN的hadoop-yarn-server-resourcemanager模块,Fair Scheduler里实现的DRF策略的代码(YARN的Scheduler主要实现的是Capacity和Fair,DRF是Fair里的一种,此外还有FIFO、Fair Share)。
package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies;
@Private
@Unstable
public class DominantResourceFairnessPolicy extends SchedulingPolicy {
public static final String NAME = "DRF";
private static final DominantResourceFairnessComparator COMPARATOR =
new DominantResourceFairnessComparator();
private static final DominantResourceCalculator CALCULATOR =
new DominantResourceCalculator();
@Override
public String getName() {
return NAME;
}
@Override
public byte getApplicableDepth() {
return SchedulingPolicy.DEPTH_ANY;
}
@Override
public Comparator<Schedulable> getComparator() {
return COMPARATOR;
}
@Override
public ResourceCalculator getResourceCalculator() {
return CALCULATOR;
}
@Override
public void computeShares(Collection<? extends Schedulable> schedulables,
Resource totalResources) {
for (ResourceType type : ResourceType.values()) {
ComputeFairShares.computeShares(schedulables, totalResources, type);
}
}
@Override
public void computeSteadyShares(Collection<? extends FSQueue> queues,
Resource totalResources) {
for (ResourceType type : ResourceType.values()) {
ComputeFairShares.computeSteadyShares(queues, totalResources, type);
}
}
@Override
public boolean checkIfUsageOverFairShare(Resource usage, Resource fairShare) {
return !Resources.fitsIn(usage, fairShare);
}
@Override
public boolean checkIfAMResourceUsageOverLimit(Resource usage, Resource maxAMResource) {
return !Resources.fitsIn(usage, maxAMResource);
}
@Override
public Resource getHeadroom(Resource queueFairShare, Resource queueUsage,
Resource maxAvailable) {
long queueAvailableMemory =
Math.max(queueFairShare.getMemorySize() - queueUsage.getMemorySize(), 0);
int queueAvailableCPU =
Math.max(queueFairShare.getVirtualCores() - queueUsage
.getVirtualCores(), 0);
Resource headroom = Resources.createResource(
Math.min(maxAvailable.getMemorySize(), queueAvailableMemory),
Math.min(maxAvailable.getVirtualCores(),
queueAvailableCPU));
return headroom;
}
@Override
public void initialize(Resource clusterCapacity) {
COMPARATOR.setClusterCapacity(clusterCapacity);
}
public static class DominantResourceFairnessComparator implements Comparator<Schedulable> {
private static final int NUM_RESOURCES = ResourceType.values().length;
private Resource clusterCapacity;
public void setClusterCapacity(Resource clusterCapacity) {
this.clusterCapacity = clusterCapacity;
}
@Override
public int compare(Schedulable s1, Schedulable s2) {
ResourceWeights sharesOfCluster1 = new ResourceWeights();
ResourceWeights sharesOfCluster2 = new ResourceWeights();
ResourceWeights sharesOfMinShare1 = new ResourceWeights();
ResourceWeights sharesOfMinShare2 = new ResourceWeights();
ResourceType[] resourceOrder1 = new ResourceType[NUM_RESOURCES];
ResourceType[] resourceOrder2 = new ResourceType[NUM_RESOURCES];
// Calculate shares of the cluster for each resource both schedulables.
calculateShares(s1.getResourceUsage(),
clusterCapacity, sharesOfCluster1, resourceOrder1, s1.getWeights());
calculateShares(s1.getResourceUsage(),
s1.getMinShare(), sharesOfMinShare1, null, ResourceWeights.NEUTRAL);
calculateShares(s2.getResourceUsage(),
clusterCapacity, sharesOfCluster2, resourceOrder2, s2.getWeights());
calculateShares(s2.getResourceUsage(),
s2.getMinShare(), sharesOfMinShare2, null, ResourceWeights.NEUTRAL);
// A queue is needy for its min share if its dominant resource
// (with respect to the cluster capacity) is below its configured min share
// for that resource
boolean s1Needy = sharesOfMinShare1.getWeight(resourceOrder1[0]) < 1.0f;
boolean s2Needy = sharesOfMinShare2.getWeight(resourceOrder2[0]) < 1.0f;
int res = 0;
if (!s2Needy && !s1Needy) {
res = compareShares(sharesOfCluster1, sharesOfCluster2,
resourceOrder1, resourceOrder2);
} else if (s1Needy && !s2Needy) {
res = -1;
} else if (s2Needy && !s1Needy) {
res = 1;
} else { // both are needy below min share
res = compareShares(sharesOfMinShare1, sharesOfMinShare2,
resourceOrder1, resourceOrder2);
}
if (res == 0) {
// Apps are tied in fairness ratio. Break the tie by submit time.
res = (int)(s1.getStartTime() - s2.getStartTime());
}
return res;
}
/**
* Calculates and orders a resource's share of a pool in terms of two vectors.
* The shares vector contains, for each resource, the fraction of the pool that
* it takes up. The resourceOrder vector contains an ordering of resources
* by largest share. So if resource=<10 MB, 5 CPU>, and pool=<100 MB, 10 CPU>,
* shares will be [.1, .5] and resourceOrder will be [CPU, MEMORY].
*/
void calculateShares(Resource resource, Resource pool,
ResourceWeights shares, ResourceType[] resourceOrder, ResourceWeights weights) {
shares.setWeight(MEMORY, (float)resource.getMemorySize() /
(pool.getMemorySize() * weights.getWeight(MEMORY)));
shares.setWeight(CPU, (float)resource.getVirtualCores() /
(pool.getVirtualCores() * weights.getWeight(CPU)));
// sort order vector by resource share
if (resourceOrder != null) {
if (shares.getWeight(MEMORY) > shares.getWeight(CPU)) {
resourceOrder[0] = MEMORY;
resourceOrder[1] = CPU;
} else {
resourceOrder[0] = CPU;
resourceOrder[1] = MEMORY;
}
}
}
private int compareShares(ResourceWeights shares1, ResourceWeights shares2,
ResourceType[] resourceOrder1, ResourceType[] resourceOrder2) {
for (int i = 0; i < resourceOrder1.length; i++) {
int ret = (int)Math.signum(shares1.getWeight(resourceOrder1[i])
- shares2.getWeight(resourceOrder2[i]));
if (ret != 0) {
return ret;
}
}
return 0;
}
}
}