walt 调度算法

lei7143

已于 2023-06-15 10:27:20 修改

阅读量1.7k

点赞数

文章标签： linux 运维服务器

于 2023-04-11 12:14:30 首次发布

本文链接：https://blog.csdn.net/lei7143/article/details/130076267

版权

walt_update_task_ravg

update_window_start

update_task_cpu_cycles

update_task_rq_cpu_cycles

update_task_demand(p, rq, event, wallclock)

update_history(rq, p, wts->sum, 1, event)

update_cpu_busy_time(p, rq, event, wallclock, irqtime)

update_top_tasks

static void update_task_pred_demand

Walt 算法

WALT负载统计原理_walt算法_森森浅浅笙笙的博客-CSDN博客

CPU负载均衡之WALT学习【转】_mb5fdcad0be2e90的技术博客_51CTO博客

1、A task’s demand is the maximum of its contribution to the most recently completed window and its average demand over the past N windows.

WALT “forgets” blocked time entirely：即只统计runable和running time，可以对于Task的实际耗时有更准确的统计，可以通过demand预测；更新demand 通过函数account_busy_for_task_demand判断

walt rq 初始化

linux/sched/walt.h

struct walt_task_struct {

...

}

2、CPU busy time - The sum of execution times of all tasks in the most recently completed window；WALT “forgets” cpu utilization as soon as tasks are taken off of the runqueue；

更新通过account_busy_for_cpu_time函数判断

struct walt_task_struct 、struct walt_rq 内嵌到tast_struct 和 struct rq 里面

walt_update_task_ravg

walt_update_task_ravg 更新demand 和 cpu busy time

kernel-5.10/kernel_platform/msm-kernel/kernel/sched/walt/walt.c

update_window_start

old_window_start = update_window_start(rq, wallclock, event);

根据当前时间wallclock 更新walt_rq 中当前窗口起始时间；

如果wallclock 和 walt_rq->window_start 间隔N（N>=1）个窗口周期，就需要walt_rq->window_start 向前偏移N个窗口周期；prev_window_size 赋值为窗口周期；

rollover_cpu_window()

update_task_cpu_cycles

2243     if (!wts->mark_start) {
2244         update_task_cpu_cycles(p, cpu_of(rq), wallclock);
2245         goto done;
2246     }

任务标记时间还没有开始，则根据cpu 周期更新rq 周期，再赋值给walt_task_struct 周期；

cycles cup周期；更新cycles 时时间last_cc_update （即为wallclock）

update_task_rq_cpu_cycles

更新rq 和 task 运行时钟周期及窗口负载（执行算力*时间/理论最大执行算力*时间）

update_task_demand(p, rq, event, wallclock)

参考

https://www.cnblogs.com/lingjiajun/p/12317090.htmlhttps://www.cnblogs.com/lingjiajun/p/12317090.html

这里wrq->window_start 已经根据wallclock 更新了；wts->mark_start 还没有更新

account_busy_for_task_demand 判断是否影响任务demand(负载），当不需要更新时如果这个时候到一个新的窗口，那就需要更新窗口历史数据

这个函数只有这一处地方调用

当负载需要更新，如果不是新的窗口，执行add_to_task_demand ，等效这段时间中，满算力执行的时间，更新wts->sum

update_history(rq, p, wts->sum, 1, event)

不同的内核版本有些细微差异

struct walt_task_struct {

u32 sum_history[RAVG_HIST_SIZE];
u16 sum_history_util[RAVG_HIST_SIZE];

}

runtime_scale=scale_time_toutil(runtime);

这里将walt 窗口大小 20ms = 20*1000*1000 ns，分为1024 等分；scale_time_toutil 就是计算占用的等份。

sum_history[]记录的时机；

sum_history_util[]记录的等份；

类似上面这个图

walt 算法5 个窗口，这里将

hist[3] => 赋值给hist[4]

hist[2] => 赋值给hist[3]

...

hist[1] => 赋值给hist[2]

再将runtime 赋值给到hist[0]

sum 为更新后hist数组和，max 为更新后hist数组最大值

再根据配置，获取demand(等效最近5个窗口计算出来的负载)

update_history

参考

https://www.cnblogs.com/lingjiajun/p/12317090.html

pred_demand_scaled 用于EAS ，根据当前窗口sum，得出bucket index，再从sum_history_util 中找到最接近 bucket index << 6 对应的历史sum_util 返回

update_history -> predict_and_update_buckets

SCHE_CAPACITY = 1024

SCHE_CAPACITY_SHIFT = 10

NUM_BUSY_BUCKETS = 16

NUM_BUSY_BUCKETS_SHIFT = 4

busy_to_bucket

就是将 wts->sum 计算为窗口时间等份再 >> 6 ；

这里就相当NUM_BUSY_BUCKETS 有16 个值分别对应窗口时间等份为

[960,1024]=15

[896,960)=14

...

[64,128)=1

[0,64)=0

这里ffs(num)获取num 第一个为1 的bit 位置；ffs(0)=0；ff(1)=1;ff(8)=4

就是找一个比start 大的第一个位置位置

这里dim<<6 dmax<<6 就相当还原了 wts->sum 为窗口时间的份数；找到历史中离当前时间最近的历史负载中在dim 和 dmax 的历史负载

7、update_history->fixup_walt_sched_stats_common

fixup_walt_sched_stats_common(rq, p, demand_scaled, pred_demand_scaled);

直接根据结果更新walt_rq->walt_sched_stats；这里其实是 delta(demand_scaled) 和 delta(pred_demand_scaled) 累加值

337         stats->cumulative_runnable_avg_scaled + demand_scaled_delta;
338     s64 pred_demands_sum_scaled =
339         stats->pred_demands_sum_scaled + pred_demand_scaled_delta;

update_history wts 更新

Sched_Boost小结

调度器分支之RTG_内核工匠的博客-CSDN博客

在后续的kernel版本升级及代码演进中，又进行了一些功能的调整，比如在5.4内核上，引入skip_min来替代perferredcluster，并通过sched_min_task_util_for_colocation来过滤掉负载较低的任务，当任务负载低于sched_min_task_util_for_colocation时，其选核时的优先调度大核任然可以持续维持sched_task_unfilter_period的时间（这个名字也很有趣，不过滤的时间周期，迟滞一段时间）。整体功能并没有发生大的变化。

这里就是完整窗口了及剩下的最后一个窗口

update_cpu_busy_time(p, rq, event, wallclock, irqtime)

【内核调度、负载计算】【WALT】【update_cpu_busy_time】_walt调度_money_yuan的博客-CSDN博客

update_cpu_busy_time 中非新的窗口

高通支持CONFIG_ARM_QCOM_CPUFREQ_HW 使用cpu_cycle_counter；不使用cpu_cycle_counter 那就

wrq->task_exec_scale = DIV64_U64_ROUNDUP(cpu_cur_freq(cpu) *arch_scale_cpu_capacity(cpu), wrq->cluster->max_possible_freq);

如果使用cpu_cycle_counter 那就获取上一次到当前cpu cycle ，计算出delta 时间

cycles_delta = cur_cycles - wts->cpu_cycles;

cycles_delta = cycles_delta * NSEC_PER_MSEC;

time_delta = wallclock - wts->mark_start;

wrq->task_exec_scale = DIV64_U64_ROUNDUP(cycles_delta *arch_scale_cpu_capacity(cpu),
time_delta *wrq->cluster->max_possible_freq);

scale_exec_time

walt demand(boost) 就跟scale_exec_time 很类似; (curr_freq/cpuinfo.max) * (max_cap/1024) 表示当前cpu 频点curr_freq和当前cpu 算力max_cap

task util 跟 scale_time_to_util 很类似

1024/window_size 表示将window_size (20ms）分为1024 份没一份时间；

非当前任务

计算对上一个窗口的贡献

只跨越一个窗口，window_start-mark_start 指在上一个窗口时间；

跨越了多个窗口，那最近上一个窗口时间是窗口打下即window_size;

计算对当前窗口贡献

当前窗口时间为wallclock-window_start

update_top_tasks

调度16ms，NUM_LOAD_INDICES 是 1000

#define DEFAULT_SCHED_RAVG_WINDOW 16000000

sched_load_granule = DEFAULT_SCHED_RAVG_WINDOW / NUM_LOAD_INDICES;

static void update_task_pred_demand

不同算法对比：

cfs eas

task placement task load (pelt) task demand(walt)

balance task load(pelt) task load(pelt)

cupfreq task utility (pelt) task utility(walt)

lei7143

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
walt 调度算法

在后续的kernel版本升级及代码演进中，又进行了一些功能的调整，比如在5.4内核上，引入skip_min来替代perferredcluster，并通过sched_min_task_util_for_colocation来过滤掉负载较低的任务，当任务负载低于sched_min_task_util_for_colocation时，其选核时的优先调度大核任然可以持续维持sched_task_unfilter_period的时间（这个名字也很有趣，不过滤的时间周期，迟滞一段时间）。整体功能并没有发生大的变化。
复制链接

扫一扫