深入解读Docker底层技术cgroup系列（4）——cgroup子系统cpu

最新推荐文章于 2024-09-04 16:32:43 发布

LoneHugo

最新推荐文章于 2024-09-04 16:32:43 发布

阅读量1k

点赞数 1

分类专栏： Cgroup Docker 文章标签： cgroup docker CPU schedule 调度

本文链接：https://blog.csdn.net/Vince_/article/details/88941586

版权

Cgroup 同时被 2 个专栏收录

6 篇文章 0 订阅

订阅专栏

Docker

6 篇文章 0 订阅

订阅专栏

日期	内核版本	CPU架构	作者
2019.04.06	Linux-4.4	PowerPC	LoneHugo

系列文章：https://blog.csdn.net/Vince_/article/details/89070001

CPU子系统的定义

CPU子系统为cgroup中对CPU资源进行限制和分配的子系统，属于内核调度系统功能范畴，其定义在kernel/sched/core.c文件中，由宏CONFIG_CGROUP_SCHED包裹。CPU子系统采用两种调度类进行调度，分别是CFS(Completely Fair Scheduler)和RT(Real-Time Scheduler)。在Linux中，CFS对普通优先级进程采用时间片轮转优先级策略实现调度，其中较高优先级的进程获得更多的时间片，也即更高比例的CPU资源利用时间，从而在调度过程中对剩余时间片更多的进程优先进行调度执行。而RT则为实时进程对应的调度类，对实时进程进行轮转调度。

CPU子系统结构如下：

struct cgroup_subsys cpu_cgrp_subsys = {
	.css_alloc	= cpu_cgroup_css_alloc,
	.css_online	= cpu_cgroup_css_online,
	.css_released	= cpu_cgroup_css_released,
	.css_free	= cpu_cgroup_css_free,
	.fork		= cpu_cgroup_fork,
	.can_attach	= cpu_cgroup_can_attach,
	.attach		= cpu_cgroup_attach,
	.legacy_cftypes	= cpu_files,
	.early_init	= 1,
};

对struct cgroup_subsys的成员做了一些赋值，包括生成的用户接口配置文件并指定相应的操作函数，具体信息下面展开。

cftype文件接口

static struct cftype cpu_files[] = {
#ifdef CONFIG_FAIR_GROUP_SCHED
	{
		.name = "shares",
		.read_u64 = cpu_shares_read_u64,
		.write_u64 = cpu_shares_write_u64,
	},
#endif
#ifdef CONFIG_CFS_BANDWIDTH
	{
		.name = "cfs_quota_us",
		.read_s64 = cpu_cfs_quota_read_s64,
		.write_s64 = cpu_cfs_quota_write_s64,
	},
	{
		.name = "cfs_period_us",
		.read_u64 = cpu_cfs_period_read_u64,
		.write_u64 = cpu_cfs_period_write_u64,
	},
	{
		.name = "stat",
		.seq_show = cpu_stats_show,
	},
#endif
#ifdef CONFIG_RT_GROUP_SCHED
	{
		.name = "rt_runtime_us",
		.read_s64 = cpu_rt_runtime_read,
		.write_s64 = cpu_rt_runtime_write,
	},
	{
		.name = "rt_period_us",
		.read_u64 = cpu_rt_period_read_uint,
		.write_u64 = cpu_rt_period_write_uint,
	},
#endif
	{ }	/* terminate */
};

以上接口文件对应到前面提到的调度类参数信息，分别对应到CFS和RT。

cpu.shares

对应cgroup组内进程能占有的CPU资源，这是一个相对数值，比如设定三个与CPU子系统关联的cgroup math/finance/physics，分别设置其cpu.shares为256/256/512，则三个cgroup内进程占有的CPU资源比例为1:1:2。

cpu.cfs_period_us

以微秒（µs或者us）为单位，确定CPU资源分配的时间周期，与cpu.cfs_quota_us一起使用，来确定cgroup内进程所占有的单个CPU资源配额。比如总共1s时间之内cgroup中进程可以占用单个CPU资源为0.2s，则设置cpu.cfs_quota_us为200000，设置cpu.cfs_period_us为1000000。cpu.cfs_quota_us的上下限分别为1s和1000ms。

cpu.cfs_quota_us

指定了在一个周期之内cgroup中进程可以占用的CPU资源，以us为单位，周期由cpu.cfs_period_us确定。当组内进程使用完了分配的时间配额，他们会在当前的周期之内被节流，即处于throttled状态，在下一个调度周期内拥有新的时间配额运行。因为quota和period参数基于单个CPU，如果想要cgroup中进程完全使用两个CPUs，则可以设置quota信息为200000，而period信息为100000。如果设置cpu.cfs_quota_us为-1，则表示cgroup不受CPU的时间配额限制，对每个cgroup来说这是默认配置（root对应的cgroup除外）。

cpu.rt_period_us

仅用于实时调度进程，指定了实时进程CPU时间周期信息，以us为单位。

cpu.rt_runtime_us

仅用于实时调度进程，以微秒为单位指定cgroup中进程最长连续可以占用的CPU资源。设定该数值可以防止单个cgroup中进程占用整个CPU资源。实际可以占用的CPU资源会根据CPU数目成倍增加。比如设定cpu.rt_period_us为1000000，cpu.rt_runtime_us为200000，则在Multi-CPUs系统上cgroup中进程可以占用CPU时间为0.4s，而在4-CPUs系统上则为0.8s。

rt_runtime_us参数

配置到struct rt_schedulable_data结构的rt_runtime成员
rt_runtime成员引用的地方如下：

__sched_setscheduler

在设置sched的配置过程中，如果rt_policy开启，则对进程的task_group属性进行判断，如果其rt_bandwidth.rt_runtime为0，则设置失败，不能将实时进程放入rt_runtime为0的task_group中

#ifdef CONFIG_RT_GROUP_SCHED
    /*
     * Do not allow realtime tasks into groups that have no runtime
     * assigned.
     */
    if (rt_bandwidth_enabled() && rt_policy(policy) &&
            task_group(p)->rt_bandwidth.rt_runtime == 0 &&
            !task_group_is_autogroup(task_group(p))) {
        task_rq_unlock(rq, p, &flags);
        return -EPERM;
    }
#endif

sched_init

在初始化调度过程中设置rt_runtime成员，初始化发生在start_kernel中，在mm_init之后

rq->rt.rt_runtime = def_rt_bandwidth.rt_runtime;

cpu_rt_runtime_write

实际上为proc文件rt_runtime_us对应的写函数，依次调用sched_group_set_rt_runtime和tg_set_rt_bandwidth进行task_group的rt_bandwidth.rt_runtime属性设置，此处还调用了__rt_schedulable进行判断，来确定是否可以进行设定。判断过程为从root_task_group开始进行遍历循环，判断所有的task_group与其子task_group层级之间的rt_bandwidth信息是否合理，因现有存在的情况已经是合理的，只需要在循环过程中将task_group为当前task_group时对应的rt_bandwidth参数信息替换为准备设置的参数即可完成验证，如果没问题，则进行设置
用到了walk_tg_tree，在其中循环遍历所有group，完成tg_rt_schedulable的检查