如今CPU的核数从单核,到双核,再到4核、8核、甚至10核。但是我们知道Android使用的多核架构都是分大小核,或者现在最新的,除了大小核以外,还有一个超大核。
区分大小核,是因为它们之间的性能(算力),功耗是不同的,而且它们又以cluster来区分(小核在一个cluster,大核在另一个cluster),而目前由于同cluster内的cpu freq是同步调节的。
所以,在对CPU的任务调度中,需要对其同样进行区分,来确保性能和功耗的平衡。
因此,针对CPU的拓扑结构,内核中会建立不同的调度域、调度组来体现。如下图,以某8核cpu为例:
在DIE level,cpu 0-7
在MC level,cpu 0-3在一组,而cpu4-7在另一组
*SMT超线程技术,会在MC level以下,再进行一次区分:01、23、45、67(这里可以暂不考虑,因为当前ARM平台并未支持SMT)
![](https://i-blog.csdnimg.cn/blog_migrate/c3ab5d0ae37d61dc5763360847957b6e.jpeg)
CPU Topology建立
在kernel中,有CPU Topology的相关代码来形成这样的结构,结构的定义在dts文件中,根据不同平台会不同。我当前这个mtk平台的DTS相关信息如下(至于这里为什么没有用qcom平台,因为现在公司暂时貌似只有mtk平台,所以可能略微有点差别):
cpu0: cpu@000 {
device_type = "cpu";
compatible = "arm,cortex-a53";
reg = <0x000>;
enable-method = "psci";
clock-frequency = <2301000000>;
operating-points-v2 = <&cluster0_opp>;
dynamic-power-coefficient = <275>;
capacity-dmips-mhz = <1024>;
cpu-idle-states = <&STANDBY &MCDI_CPU &MCDI_CLUSTER>,
<&SODI &SODI3 &DPIDLE &SUSPEND>;
};
cpu1: cpu@001 {
device_type = "cpu";
compatible = "arm,cortex-a53";
reg = <0x001>;
enable-method = "psci";
clock-frequency = <2301000000>;
operating-points-v2 = <&cluster0_opp>;
dynamic-power-coefficient = <275>;
capacity-dmips-mhz = <1024>;
cpu-idle-states = <&STANDBY &MCDI_CPU &MCDI_CLUSTER>,
<&SODI &SODI3 &DPIDLE &SUSPEND>;
};
cpu2: cpu@002 {
device_type = "cpu";
compatible = "arm,cortex-a53";
reg = <0x002>;
enable-method = "psci";
clock-frequency = <2301000000>;
operating-points-v2 = <&cluster0_opp>;
dynamic-power-coefficient = <275>;
capacity-dmips-mhz = <1024>;
cpu-idle-states = <&STANDBY &MCDI_CPU &MCDI_CLUSTER>,
<&SODI &SODI3 &DPIDLE &SUSPEND>;
};
cpu3: cpu@003 {
device_type = "cpu";
compatible = "arm,cortex-a53";
reg = <0x003>;
enable-method = "psci";
clock-frequency = <2301000000>;
operating-points-v2 = <&cluster0_opp>;
dynamic-power-coefficient = <275>;
capacity-dmips-mhz = <1024>;
cpu-idle-states = <&STANDBY &MCDI_CPU &MCDI_CLUSTER>,
<&SODI &SODI3 &DPIDLE &SUSPEND>;
};
cpu4: cpu@100 {
device_type = "cpu";
compatible = "arm,cortex-a53";
reg = <0x100>;
enable-method = "psci";
clock-frequency = <1800000000>;
operating-points-v2 = <&cluster1_opp>;
dynamic-power-coefficient = <85>;
capacity-dmips-mhz = <801>;
cpu-idle-states = <&STANDBY &MCDI_CPU &MCDI_CLUSTER>,
<&SODI &SODI3 &DPIDLE &SUSPEND>;
};
cpu5: cpu@101 {
device_type = "cpu";
compatible = "arm,cortex-a53";
reg = <0x101>;
enable-method = "psci";
clock-frequency = <1800000000>;
operating-points-v2 = <&cluster1_opp>;
dynamic-power-coefficient = <85>;
capacity-dmips-mhz = <801>;
cpu-idle-states = <&STANDBY &MCDI_CPU &MCDI_CLUSTER>,
<&SODI &SODI3 &DPIDLE &SUSPEND>;
};
cpu6: cpu@102 {
device_type = "cpu";
compatible = "arm,cortex-a53";
reg = <0x102>;
enable-method = "psci";
clock-frequency = <1800000000>;
operating-points-v2 = <&cluster1_opp>;
dynamic-power-coefficient = <85>;
capacity-dmips-mhz = <801>;
cpu-idle-states = <&STANDBY &MCDI_CPU &MCDI_CLUSTER>,
<&SODI &SODI3 &DPIDLE &SUSPEND>;
};
cpu7: cpu@103 {
device_type = "cpu";
compatible = "arm,cortex-a53";
reg = <0x103>;
enable-method = "psci";
clock-frequency = <1800000000>;
operating-points-v2 = <&cluster1_opp>;
dynamic-power-coefficient = <85>;
capacity-dmips-mhz = <801>;
cpu-idle-states = <&STANDBY &MCDI_CPU &MCDI_CLUSTER>,
<&SODI &SODI3 &DPIDLE &SUSPEND>;
};
cpu-map {
cluster0 {
core0 {
cpu = <&cpu0>;
};
core1 {
cpu = <&cpu1>;
};
core2 {
cpu = <&cpu2>;
};
core3 {
cpu = <&cpu3>;
};
};
cluster1 {
core0 {
cpu = <&cpu4>;
};
core1 {
cpu = <&cpu5>;
};
core2 {
cpu = <&cpu6>;
};
core3 {
cpu = <&cpu7>;
};
};
};
代码路径:drivers/base/arch_topology.c、arch/arm64/kernel/topology.c,本文代码以CAF Kernel msm-5.4为例。
第一部分,这里解析DTS,并保存cpu_topology的package_id,core_id,cpu_sclae(cpu_capacity_orig)
kernel_init()
-> kernel_init_freeable()
-> smp_prepare_cpus()
-> init_cpu_topology()
-> parse_dt_topology()
针对dts中,依次解析"cpus"节点,以及其中的"cpu-map"节点;
先解析了其中cluster节点的内容结构。
在对cpu capacity进行归一化
static int __init parse_dt_topology(void)
{
struct device_node *cn, *map;
int ret = 0;
int cpu;
cn = of_find_node_by_path("/cpus"); //查找dts中 /cpus的节点
if (!cn) {
pr_err("No CPU information found in DT\n");
return 0;
}
/*
* When topology is provided cpu-map is essentially a root
* cluster with restricted subnodes.
*/
map = of_get_child_by_name(cn, "cpu-map"); //查找/cpus节点下,cpu-map节点
if (!map)
goto out;
ret = parse_cluster(map, 0); //(1)解析cluster结构
if (ret != 0)
goto out_map;
topology_normalize_cpu_scale(); //(2)将cpu capacity归一化
/*
* Check that all cores are in the topology; the SMP code will
* only mark cores described in the DT as possible.
*/
for_each_possible_cpu(cpu)
if (cpu_topology[cpu].package_id == -1)
ret = -EINVAL;
out_map:
of_node_put(map);
out:
of_node_put(cn);
return ret;
}
(1)解析cluster结构
通过第一个do-while循环,进行"cluster+序号"节点的解析:当前平台分别解析cluster0、1。然后仍然调用自身函数,这样代码复用,进一步解析其中的“core”结构
在进一步解析core结构时,同样通过第二个do-while循环,进行"core+序号"节点的解析:当前平台支持core0,1...7,共8个核,通过parse_core函数进一步解析
所以实际解析执行顺序应该是:cluster0,core0,1,2,3;cluster1,core4,5,6,7。
最后在每个cluster中的所有core都解析完,跳出其do-while循环时,package_id就是递增。说明package_id就对应了cluster的id
static int __init parse_cluster(struct device_node *cluster, int depth)
{
char name[20];
bool leaf = true;
bool has_cores = false;
struct device_node *c;
static int package_id __initdata;
int core_id = 0;
int i, ret;
/*
* First check for child clusters; we currently ignore any
* information about the nesting of clusters and present the
* scheduler with a flat list of them.
*/
i = 0;
do {
snprintf(name, sizeof(name), "cluster%d", i); //依次解析cluster0,1... 当前平台只有cluster0/1
c = of_get_child_by_name(cluster, name); //检查cpu-map下,是否有cluster结构
if (c) {
leaf = false;
ret = parse_cluster(c, depth + 1); //如果有cluster结构,会继续解析更深层次的core结构。(这里通过代码复用,接着解析core结构)
of_node_put(c);
if (ret != 0)
return ret;
}
i++;
} while (c);
/* Now check for cores */
i = 0;
do {
snprintf(name, sizeof(name), "core%d", i); //依次解析core0,1... 当前平台有8个core
c = of_get_child_by_name(cluster, name); //检查cluster下,是否有core结构
if (c) {
has_cores = true;
if (depth == 0) { //这里要注意,是因为上面depth+1的调用才会走下去
pr_err("%pOF: cpu-map children should be clusters\n", //如果cpu-map下没有cluster结构的(depth==0),就会报错
c);
of_node_put(c);
return -EINVAL;
}
if (leaf) { //在depth+1的情况下,leaf == true,说明是core level了
ret = parse_core(c, package_id, core_id++); //(1-1)解析core结构
} else {
pr_err("%pOF: Non-leaf cluster with core %s\n",
cluster, name);
ret = -EINVAL;
}
of_node_put(c);
if (ret != 0)
return ret;
}
i++;
} while (c);
if (leaf && !has_cores)
pr_warn("%pOF: empty cluster\n", cluster);
if (leaf) //在core level遍历完成:说明1个cluster解析完成,要解析下一个cluster了,package id要递增了
package_id++; //所以package id就对应了cluster id
return 0;
}
(1-1)解析core结构
因为当前平台不支持超线程,所以core+序号节点下面,没有thread+序号的节点了
解析cpu节点中的所有信息
更新cpu_topology[cpu].package_id、core_id,分别对应了哪个cluster的哪个core
static int __init parse_core(struct device_node *core, int package_id,
int core_id)
{
char name[20];
bool leaf = true;
int i = 0;
int cpu;
struct device_node *t;
do {
snprintf(name, sizeof(name), "thread%d", i); //不支持SMT,所以dts没有在core下面配置超线程
t = of_get_child_by_name(core, name);
if (t) {
leaf = false;
cpu = get_cpu_for_node(t);
if (cpu >= 0) {
cpu_topology[cpu].package_id = package_id;
cpu_topology[cpu].core_id = core_id;
cpu_topology[cpu].thread_id = i;
} else {
pr_err("%pOF: Can't get CPU for thread\n",
t);
of_node_put(t);
return -EINVAL;
}
of_node_put(t);
}
i++;
} while (t);
cpu = get_cpu_for_node(core); //(1-1-1)从core中解析cpu节点
if (cpu >= 0) {
if (!leaf) {
pr_err("%pOF: Core has both threads and CPU\n",
core);
return -EINVAL;
}
cpu_topology[cpu].package_id = package_id; //保存package id(cluster id)到cpu_topology结构体的数组
cpu_topology[cpu].core_id = core_id; //保存core id到cpu_topology结构体的数组; core id对应cpu号:0,1...7
} else if (leaf) {
pr_err("%pOF: Can't get CPU for leaf core\n", core);
return -EINVAL;
}
return 0;
}
(1-1-1)从core中解析cpu节点
从core节点中查找cpu节点,并对应好cpu id
再解析cpu core的capacity
static int __init get_cpu_for_node(struct device_node *node)
{
struct device_node *cpu_node;
int cpu;
cpu_node = of_parse_phandle(node, "cpu", 0); //获取core节点中cpu节点信息
if (!cpu_node)
return -1;
cpu = of_cpu_node_to_id(cpu_node); //获取cpu节点对应的cpu core id:cpu-0,1...
if (cpu >= 0)
topology_parse_cpu_capacity(cpu_node, cpu); //(1-1-1-1)解析每个cpu core的capacity
else
pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
of_node_put(cpu_node);
return cpu;
}
(1-1-1-1)解析每个cpu core的capacity
先解析capacity-dmips-mhz值作为cpu raw_capacity,这个参数就是对应了cpu的算力,数字越大,算力越强(可以对照上面mtk平台dts,明显是大小核架构;但不同的是,它cpu0-3都是大核,cpu4-7是小核,这个与一般的配置不太一样,一般qcom平台是反过来,cpu0-3是小核,4-7是大核)
当前raw_capcity是cpu 0-3:1024,cpu4-7:801
bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
{
static bool cap_parsing_failed;
int ret;
u32 cpu_capacity;
if (cap_parsing_failed)
return false;
ret = of_property_read_u32(cpu_node, "capacity-dmips-mhz", //解析cpu core算力,kernel4.19后配置该参数
&cpu_capacity);
if (!ret) {
if (!raw_capacity) {
raw_capacity = kcalloc(num_possible_cpus(), //为所有cpu raw_capacity变量都申请空间
sizeof(*raw_capacity),
GFP_KERNEL);
if (!raw_capacity) {
cap_parsing_failed = true;
return false;
}
}
capacity_scale = max(cpu_capacity, capacity_scale); //记录最大cpu capacity值作为scale
raw_capacity[cpu] = cpu_capacity; //raw capacity就是dts中dmips值
pr_debug("cpu_capacity: %pOF cpu_capacity=%u (raw)\n",
cpu_node, raw_capacity[cpu]);
} else {
if (raw_capacity) {
pr_err("cpu_capacity: missing %pOF raw capacity\n",
cpu_node);
pr_err("cpu_capacity: partial information: fallback to 1024 for all CPUs\n");
}
cap_parsing_failed = true;
free_raw_capacity();
}
return !ret;
}
(2)将cpu raw_capacity进行归一化
遍历每个cpu core进行归一化,其实就是将最大值映射为1024,小的值,按照原先比例n,归一化为n*1024。
归一化步骤:将当前raw_capacity *1024 /capacity_scale,capacity_scale其实就是raw_capacity的最大值,其实就是1024
将cpu raw capacity保存到per_cpu变量:cpu_scale中,在内核调度中经常使用的cpu_capacity_orig、cpu_capacity参数的计算都依赖它。
void topology_normalize_cpu_scale(void)
{
u64 capacity;
int cpu;
if (!raw_capacity)
return;
pr_debug("cpu_capacity: capacity_scale=%u\n", capacity_scale);
for_each_possible_cpu(cpu) {
pr_debug("cpu_capacity: cpu=%d raw_capacity=%u\n",
cpu, raw_capacity[cpu]);
capacity = (raw_capacity[cpu] << SCHED_CAPACITY_SHIFT) //就是按照max cpu capacity的100% = 1024的方式归一化capacity
/ capacity_scale;
topology_set_cpu_scale(cpu, capacity); //更新per_cpu变量cpu_scale(cpu_capacity_orig)为各自的cpu raw capacity
pr_debug("cpu_capacity: CPU%d cpu_capacity=%lu\n",
cpu, topology_get_cpu_scale(cpu));
}
}
第二部分更新sibling_mask
cpu0的调用路径如下:
kernel_init
-> kernel_init_freeable
-> smp_prepare_cpus
-> store_cpu_topology
cpu1-7的调用路径如下:
secondary_start_kernel
-> store_cpu_topology
void store_cpu_topology(unsigned int cpuid)
{
struct cpu_topology *cpuid_topo = &cpu_topology[cpuid];
u64 mpidr;
if (cpuid_topo->package_id != -1) //这里因为已经解析过package_id了,所以直接就不会走读协处理器寄存器等相关步骤了
goto topology_populated;
mpidr = read_cpuid_mpidr();
/* Uniprocessor systems can rely on default topology values */
if (mpidr & MPIDR_UP_BITMASK)
return;
/*
* This would be the place to create cpu topology based on MPIDR.
*
* However, it cannot be trusted to depict the actual topology; some
* pieces of the a