问题现象
客户发现在物理机上跑读写业务时,出现了一次死机现象,kdump服务未抓到vmcore文件。/var/log/messages里没有发现内核panic报错信息,只有call trace的警告信息。抓取到的call trace信息总共有三种类型:内存分配失败、rmmod、任务超时。
问题不稳定复现,客户分析是在新读写业务开始调试长加电运行后,大概两周多左右出现了死机问题。业务设备由于诸多原因无法邮寄,也无法提供远程环境。但问题紧急程度很高,要求快速解决。
因没有生成vmcore不好准确分析,于是第一时间提供客户监测系统多种性能的脚本放到后台运行,运行一周左右取出这段时间监测到的数据信息以及系统messages。此时系统虽未死机,但已成功复现了上述三类call trace,下文将对抓取到的日志进行分析
日志分析
内存分配失败日志分析
系统message分析
page allocation failure的call trace输出,以下内核输出的时间戳在15:39:46左右:
kernel: [67442.657502] mount: page allocation failure: order:0, mode:0x440000(GFP_NOWAIT|__GFP_THISNODE), nodemask=(null)
kernel: [67442.658760] mount cpuset=/ mems_allowed=0,2,4,6,8,10,12,14
kernel: [67442.659451] CPU: 31 PID: 872267 Comm: mount Kdump: loaded Tainted: G W OE K 4.19.90-25.2.aarch64 #1
kernel: [67442.660681] Hardware name: ... ...
kernel: [67442.661595] Call trace:
kernel: [67442.661910] dump_backtrace+0x0/0x170
kernel: [67442.662392] show_stack+0x24/0x30
kernel: [67442.662809] dump_stack+0xa4/0xe8
kernel: [67442.663256] warn_alloc+0xc4/0x128
kernel: [67442.663664] __alloc_pages_nodemask+0xcd4/0xd70
... ...
kernel: [67442.673551] __arm64_sys_mount+0x28/0x38
kernel: [67442.674033] el0_svc_common+0x78/0x130
kernel: [67442.674505] el0_svc_handler+0x38/0x78
kernel: [67442.674966] el0_svc+0x8/0x1b0
本次失败是申请2^0个page时失败,即连续1个页的内存(pagesize为64kB),一般order 0分配失败代表着系统已经没有内存了。
因GFP_NOWAIT代表不可睡眠等待,__GFP_THISNODE代表从指定内存节点中分配并且无回退机制,没有指定zone则默认是ZONE_NORMAL。怀疑可能是当时指定的node节点里zone normal的order 0链表内没有找到空闲且匹配成功的页,故而没有重试直接返回了申请失败。根据日志中call trace前,业务软件的输出表示将任务绑定到了node 0上,猜测可能是node 0上没有空闲页。
由日志信息可见,死机时pid为872267的进程调用__alloc_pages_nodemask直接申请物理内存。由底层堆栈知,这段过程应该是由用户态的应用程序调用mount系统调用,触发el0中断,从而陷入内核。
内核中调用warn_alloc的函数有:__alloc_pages_slowpath、__vmalloc_area_node、__vmalloc_node_range、vmemmap_alloc_block。根据堆栈函数调用流程知,进入的内存分配函数是__alloc_pages_nodemask,在此函数内会通过get_page_from_freelist快速内存分配。快速分配失败以后,会去调用__alloc_pages_slowpath进行慢速内存分配。在__alloc_pages_slowpath里,先会尝试唤醒kswapd进程回收内存,因本次未设置__GFP_KSWAPD_RECLAIM属性,故未走这个流程。之后,重新尝试分配了一次内存,但是仍旧失败了。失败后的流程需要判断当前是否可以进行内存回收操作,若回收后依旧分配失败,则会进入OOM检查流程。因本次未设置__GFP_DIRECT_RECLAIM属性,故直接判断为内存申请失败就调warn_alloc报错返回,没有走内核direct reclaim回收内存的流程,进而也没有触发OOM。
为了详细了解当时的内存状态,进一步分析meminfo输出。
第一段meminfo输出:
kernel: [67442.726710] Node 0 DMA32 free:119232kB min:1664kB low:2560kB high:3456kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:595072kB writepending:0kB present:2031616kB managed:907904kB mlocked:595072kB kernel_stack:0kB pagetables:128kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:1472kB
kernel: [67442.729920] lowmem_reserve[]: 0 1913 1913
kernel: [67442.730433] Node 0 Normal free:124032kB min:58752kB low:90048kB high:121344kB active_anon:1985152kB inactive_anon:13248kB active_file:803840kB inactive_file:7035136kB unevictable:18799680kB writepending:320kB present:31391744kB managed:31353600kB mlocked:18799680kB kernel_stack:17328kB pagetables:14528kB bounce:0kB free_pcp:34112kB local_pcp:128kB free_cma:0kB
kernel: [67442.733879] lowmem_reserve[]: 0 0 0
kernel: [67442.734339] Node 2 Normal free:12869376kB min:62784kB low:96256kB high:129728kB active_anon:1365696kB inactive_anon:29376kB active_file:2222784kB inactive_file:2928960kB unevictable:12462720kB writepending:2240kB present:33554432kB managed:33516416kB mlocked:12462720kB kernel_stack:18512kB pagetables:19008kB bounce:0kB free_pcp:66944kB local_pcp:64kB free_cma:0kB
kernel: [67442.737912] lowmem_reserve[]: 0 0 0
kernel: [67442.738457] Node 4 Normal free:1002240kB min:62784kB low:96256kB high:129728kB active_anon:1204288kB inactive_anon:9600kB active_file:283264kB inactive_file:4446656kB unevictable:25140416kB writepending:2240kB present:33554432kB managed:33516480kB mlocked:25140416kB kernel_stack:5776kB pagetables:15616kB bounce:0kB free_pcp:65856kB local_pcp:1024kB free_cma:0kB
kernel: [67442.741901] lowmem_reserve[]: 0 0 0
kernel: [67442.742368] Node 6 Normal free:15073536kB min:62784kB low:96256kB high:129728kB active_anon:475520kB inactive_anon:20096kB active_file:153216kB inactive_file:1919424kB unevictable:15502400kB writepending:1024kB present:33554432kB managed:33516480kB mlocked:15502464kB kernel_stack:10272kB pagetables:8448kB bounce:0kB free_pcp:64896kB local_pcp:1024kB free_cma:0kB
kernel: [67442.745760] lowmem_reserve[]: 0 0 0
kernel: [67442.746199] Node 8 Normal free:3885312kB min:62656kB low:96064kB high:129472kB active_anon:1126464kB inactive_anon:1024kB active_file:528768kB inactive_file:2995776kB unevictable:23091456kB writepending:768kB present:33488896kB managed:33450816kB mlocked:23091456kB kernel_stack:11856kB pagetables:11840kB bounce:0kB free_pcp:51456kB local_pcp:832kB free_cma:0kB
kernel: [67442.749695] lowmem_reserve[]: 0 0 0
kernel: [67442.750213] Node 10 Normal free:13484864kB min:62784kB low:96256kB high:129728kB active_anon:2751808kB inactive_anon:448kB active_file:714112kB inactive_file:3798400kB unevictable:12332672kB writepending:1088kB present:33554432kB managed:33516416kB mlocked:12332672kB kernel_stack:7648kB pagetables:4864kB bounce:0kB free_pcp:49216kB local_pcp:832kB free_cma:0kB
kernel: [67442.753654] lowmem_reserve[]: 0 0 0
kernel: [67442.754118] Node 12 Normal free:19964416kB min:62720kB low:96192kB high:129664kB active_anon:333568kB inactive_anon:2048kB active_file:7296kB inactive_file:558144kB unevictable:12332480kB writepending:64kB present:33554432kB managed:33492928kB mlocked:12332480kB kernel_stack:2592kB pagetables:768kB bounce:0kB free_pcp:33728kB local_pcp:896kB free_cma:0kB
kernel: [67442.757542] lowmem_reserve[]: 0 0 0
kernel: [67442.758019] Node 14 Normal free:20600384kB min:62720kB low:96192kB high:129664kB active_anon:356416kB inactive_anon:640kB active_file:12416kB inactive_file:7232kB unevictable:12332480kB writepending:0kB present:33554432kB managed:33474368kB mlocked:12332480kB kernel_stack:2256kB pagetables:1472kB bounce:0kB free_pcp:27136kB local_pcp:896kB free_cma:0kB
kernel: [67442.761357] lowmem_reserve[]: 0 0 0
除了Node 0 DMA32,其他节点均未留有CMA预留区域。所有Node Normal区域的free值均高于low、min水位线,说明当前系统仍有空闲内存。但一般情况下min以下的内存是不会被分配的,故各节点Normal区域可分配内存大小为,Node 0&