1.内核文档
上面的例子说明:
最开始某个进程是在cgroup A中,后面要迁移到cgroup B中,那么进程的内存计数是否要完全迁入B中,就是通过memory.move_charge_at_immigrate控制,如果目标cgroup也就是B设置了1到该字段中,那么进程的内存记账也要从cgroup A中去掉(uncharge from A),同时计算到B中(charge to B)。
2. 源码流程
2.1 uncharge A和charge B调用栈
上面提到进程进行cgroup迁移的过程中:如果目标cgroup配置了move_charge_at_immigrate,那么就要将进程的内存占用从源cgroup减掉,同时加到目标cgroup中,这个过程实在mem_cgroup_move_account函数实现的,具体调用站如下:
#41 0x0000000000000000 in ?? ()
(gdb) bt
#0 mem_cgroup_move_account (page=0xffffea0000086300, compound=false, from=0xffff888005fba000, to=0xffff888005fb9000) at mm/memcontrol.c:5592
#1 0xffffffff81288da3 in mem_cgroup_move_charge_pte_range (pmd=<optimized out>, addr=8949760, end=8962048, walk=<optimized out>) at mm/memcontrol.c:6072
#2 0xffffffff8123d3eb in walk_pmd_range (walk=<optimized out>, end=<optimized out>, addr=8962048, pud=<optimized out>) at mm/pagewalk.c:89
#3 walk_pud_range (walk=<optimized out>, end=<optimized out>, addr=8962048, p4d=<optimized out>) at mm/pagewalk.c:160
#4 walk_p4d_range (walk=<optimized out>, end=<optimized out>, addr=8962048, pgd=<optimized out>) at mm/pagewalk.c:193
#5 walk_pgd_range (walk=<optimized out>, end=<optimized out>, addr=8962048) at mm/pagewalk.c:229
#6 __walk_page_range (start=8949760, end=<optimized out>, walk=0xffff888005f73d48) at mm/pagewalk.c:331
#7 0xffffffff8123d9f5 in walk_page_range (mm=<optimized out>, start=8949760, end=18446612682170408960, ops=<optimized out>, private=<optimized out>) at mm/pagewalk.c:427
#8 0xffffffff812857b6 in mem_cgroup_move_charge () at mm/memcontrol.c:6145
#9 mem_cgroup_move_task () at mm/memcontrol.c:6155
#10 0xffffffff81182b1c in cgroup_procs_write_finish (task=<optimized out>, locked=<optimized out>) at kernel/cgroup/cgroup.c:2827
#11 0xffffffff8118856d in __cgroup1_procs_write (of=0xffff888005fa46c0, buf=<optimized out>, nbytes=<optimized out>, threadgroup=<optimized out>, off=<optimized out>) at kernel/cgroup/cgroup-v1.c:522
#12 0xffffffff811885ae in cgroup1_procs_write (of=<optimized out>, buf=<optimized out>, nbytes=<optimized out>, off=<optimized out>) at kernel/cgroup/cgroup-v1.c:532
#13 0xffffffff8117fa48 in cgroup_file_write (of=<optimized out>, buf=<optimized out>, nbytes=4, off=<optimized out>) at kernel/cgroup/cgroup.c:3697
#14 0xffffffff81337576 in kernfs_fop_write (file=<optimized out>, user_buf=<optimized out>, count=<optimized out>, ppos=0xffff888005fb9000) at fs/kernfs/file.c:315
#15 0xffffffff81297bfc in vfs_write (pos=<optimized out>, count=4, buf=<optimized out>, file=<optimized out>) at fs/read_write.c:584
#16 vfs_write (file=0xffff888005faff00, buf=0x2806fd0 "130\n", count=<optimized out>, pos=0xffff888005f73ef0) at fs/read_write.c:566
#17 0xffffffff81297eac in ksys_write (fd=<optimized out>, buf=0x2806fd0 "130\n", count=4) at fs/read_write.c:639
#18 0xffffffff81297f35 in __do_sys_write (count=<optimized out>, buf=<optimized out>, fd=<optimized out>) at fs/read_write.c:651
#19 __se_sys_write (count=<optimized out>, buf=<optimized out>, fd=<optimized out>) at fs/read_write.c:648
#20 __x64_sys_write (regs=<optimized out>) at fs/read_write.c:648
#21 0xffffffff81c71608 in do_syscall_64 (nr=<optimized out>, regs=0xffff888005f73f58) at arch/x86/entry/common.c:46
#22 0xffffffff81e0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:118
2.2 进程task_struct数据结构的cgroups字段是哪里更新到B cgroup
进程搬迁cgroup的过程中,我们尤其比较关心一个字段:page->mem_cgroup,按道理来讲,如果设置了move_charge_at_immigrate,既然进程迁入新的cgroup,内存技术已经同样加到了目标的cgroup,那么page->mem_cgroup理论上也应该修改指向目标的cgroup,确实如此,具体实现在
cgroup_move_task 函数中,调用栈:
0 cgroup_move_task (task=0xffff8880074f3400, to=0xffff888005f18600) at kernel/sched/psi.c:962
#1 0xffffffff81182e54 in css_set_move_task (task=0xffff8880074f3400, from_cset=0xffff888005fd7000, to_cset=0xffff888005f18600, use_mg_tasks=<optimized out>) at kernel/cgroup/cgroup.c:899
#2 0xffffffff811830b1 in cgroup_migrate_execute (mgctx=<optimized out>) at kernel/cgroup/cgroup.c:2426
#3 0xffffffff811833cb in cgroup_migrate (leader=0xffff8880074f3400, threadgroup=true, mgctx=0xffff888005f73d30) at kernel/cgroup/cgroup.c:2707
#4 0xffffffff81183519 in cgroup_attach_task (dst_cgrp=0xffff888005f9e800, leader=0xffff8880074f3400, threadgroup=true) at kernel/cgroup/cgroup.c:2740
#5 0xffffffff8118855e in __cgroup1_procs_write (of=0xffff888005fa43c0, buf=<optimized out>, nbytes=<optimized out>, threadgroup=<optimized out>, off=<optimized out>) at kernel/cgroup/cgroup-v1.c:519
#6 0xffffffff811885ae in cgroup1_procs_write (of=<optimized out>, buf=<optimized out>, nbytes=<optimized out>, off=<optimized out>) at kernel/cgroup/cgroup-v1.c:532
#7 0xffffffff8117fa48 in cgroup_file_write (of=<optimized out>, buf=<optimized out>, nbytes=4, off=<optimized out>) at kernel/cgroup/cgroup.c:3697
#8 0xffffffff81337576 in kernfs_fop_write (file=<optimized out>, user_buf=<optimized out>, count=<optimized out>, ppos=0xffff888005fd70a0) at fs/kernfs/file.c:315
#9 0xffffffff81297bfc in vfs_write (pos=<optimized out>, count=4, buf=<optimized out>, file=<optimized out>) at fs/read_write.c:584
#10 vfs_write (file=0xffff888005fc4100, buf=0x28070f0 "145\n", count=<optimized out>, pos=0xffff888005f73ef0) at fs/read_write.c:566
#11 0xffffffff81297eac in ksys_write (fd=<optimized out>, buf=0x28070f0 "145\n", count=4) at fs/read_write.c:639
#12 0xffffffff81297f35 in __do_sys_write (count=<optimized out>, buf=<optimized out>, fd=<optimized out>) at fs/read_write.c:651
#13 __se_sys_write (count=<optimized out>, buf=<optimized out>, fd=<optimized out>) at fs/read_write.c:648
#14 __x64_sys_write (regs=<optimized out>) at fs/read_write.c:648
#15 0xffffffff81c71608 in do_syscall_64 (nr=<optimized out>, regs=0xffff888005f73f58) at arch/x86/entry/common.c:46
#16 0xffffffff81e0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:118