问题现象:
最近遇到不同的业务正常运行时出现了宕机(物理机、虚拟机),查看日志是指向的怀疑是内存问题。
业务都是部署的Kubernetes(容器集群管理系统)。
初步日志排查:
message日志信息反馈(下面是虚拟机的日志信息):
这些日志重复循环到问题宕机。
Oct 16 00:51:51 uos-PC kernel: [4307490.033245] Tasks state (memory values in pages):
Oct 16 00:51:51 uos-PC kernel: [4307490.033246] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Oct 16 00:51:51 uos-PC kernel: [4307490.034341] [ 8279] 0 8279 19 1 393216 0 -998 pause
Oct 16 00:51:51 uos-PC kernel: [4307490.034344] [ 8958] 0 8958 197645 1614 2031616 0 -997 promtail
Oct 16 00:51:54 uos-PC kernel: [4307493.279922] cni0: port 1(vethcfdc7bc3) entered disabled state
Oct 16 00:51:54 uos-PC kernel: [4307493.287374] device vethcfdc7bc3 left promiscuous mode
Oct 16 00:51:54 uos-PC kernel: [4307493.287378] cni0: port 1(vethcfdc7bc3) entered disabled state
Oct 16 00:51:55 uos-PC kernel: [4307494.173866] cni0: port 1(vethf6a2e403) entered blocking state
Oct 16 00:51:55 uos-PC kernel: [4307494.173871] cni0: port 1(vethf6a2e403) entered disabled state
Oct 16 00:51:55 uos-PC kernel: [4307494.173999] device vethf6a2e403 entered promiscuous mode
Oct 16 00:51:55 uos-PC kernel: [4307494.174133] cni0: port 1(vethf6a2e403) entered blocking state
Oct 16 00:51:55 uos-PC kernel: [4307494.174135] cni0: port 1(vethf6a2e403) entered forwarding state
Oct 16 00:52:13 uos-PC NetworkManager[762]: <info> [1665852733.6197] device (vethf6a2e403): carrier: link connected
Oct 16 00:52:13 uos-PC NetworkManager[762]: <info> [1665852733.6201] manager: (vethf6a2e403): new Veth device (/org/freedesktop/NetworkManager/Devices/945)
Oct 16 00:52:13 uos-PC NetworkManager[762]: <info> [1665852733.6219] device (vethcfdc7bc3): released from master device cni0
Oct 16 00:52:13 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 00:54:07 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 00:56:13 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 00:58:11 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 01:00:12 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 01:02:12 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 01:04:15 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 01:06:15 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 01:08:13 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 01:10:14 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 01:12:18 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 01:14:17 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 01:16:12 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 01:18:13 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 01:20:11 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 01:22:13 uos-PC ansible-setup: Invoked with filter=* gather_subset=['!all', 'min', 'hardware'] fact_path=/etc/ansible/facts.d gather_timeout=120
Oct 16 01:23:22 uos-PC kernel: [4309380.849934] promtail invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=-997
Oct 16 01:23:22 uos-PC kernel: [4309380.849936] promtail cpuset=19b2adbcc3f21d764916d3a1eba29b989a006eb0d09198500fa79c962e76bd41 mems_allowed=0
Oct 16 01:23:22 uos-PC kernel: [4309380.849941] CPU: 12 PID: 26431 Comm: promtail Not tainted 4.19.0-arm64-server #3211
Oct 16 01:23:22 uos-PC kernel: [4309380.849943] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
Oct 16 01:23:22 uos-PC kernel: [4309380.849943] Call trace:
Oct 16 01:23:22 uos-PC kernel: [4309380.849948] dump_backtrace+0x0/0x190
Oct 16 01:23:22 uos-PC kernel: [4309380.849949] show_stack+0x14/0x20
Oct 16 01:23:22 uos-PC kernel: [4309380.849952] dump_stack+0xa8/0xcc
Oct 16 01:23:22 uos-PC kernel: [4309380.849955] dump_header+0x64/0x1d8
Oct 16 01:23:22 uos-PC kernel: [4309380.849956] oom_kill_process+0x104/0x358
Oct 16 01:23:22 uos-PC kernel: [4309380.849957] out_of_memory+0x170/0x4b0
Oct 16 01:23:22 uos-PC kernel: [4309380.849961] mem_cgroup_out_of_memory+0x94/0xa0
Oct 16 01:23:22 uos-PC kernel: [4309380.849962] try_charge+0x5fc/0x678
Oct 16 01:23:22 uos-PC kernel: [4309380.849963] mem_cgroup_try_charge+0x6c/0x138
Oct 16 01:23:22 uos-PC kernel: [4309380.849964] mem_cgroup_try_charge_delay+0x20/0x50
Oct 16 01:23:22 uos-PC kernel: [4309380.849968] __handle_mm_fault+0x8d0/0xbe0
Oct 16 01:23:22 uos-PC kernel: [4309380.849969] handle_mm_fault+0xec/0x1b0
Oct 16 01:23:22 uos-PC kernel: [4309380.849972] do_page_fault+0x168/0x460
Oct 16 01:23:22 uos-PC kernel: [4309380.849973] do_translation_fault+0x58/0x60
Oct 16 01:23:22 uos-PC kernel: [4309380.849974] do_mem_abort+0x3c/0xd0
Oct 16 01:23:22 uos-PC kernel: [4309380.849976] el0_da+0x20/0x24
Oct 16 01:23:22 uos-PC kernel: [4309380.849977] Task in /kubepods/podac0a8ef5-57c2-49b5-ae3d-31725fc1766d/cae596f746fe75c5186591f9082d07b5c3832e5f72380e2a50442d5cf268f26e killed as a result of limit of /kubepods/podac0a8ef5-57c2-49b5-ae3d-31725fc1766d
Oct 16 01:23:22 uos-PC kernel: [4309380.849981] memory: usage 204864kB, limit 204800kB, failcnt 161782
Oct 16 01:23:22 uos-PC kernel: [4309380.849982] memory+swap: usage 0kB, limit 9007199254740928kB, failcnt 0
Oct 16 01:23:22 uos-PC kernel: [4309380.849983] kmem: usage 55232kB, limit 9007199254740928kB, failcnt 0
Oct 16 01:23:22 uos-PC kernel: [4309380.849984] Memory cgroup stats for /kubepods/podac0a8ef5-57c2-49b5-ae3d-31725fc1766d: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Oct 16 01:23:22 uos-PC kernel: [4309380.849990] Memory cgroup stats for /kubepods/podac0a8ef5-57c2-49b5-ae3d-31725fc1766d/00c1facfa6161817fcc9f6e393215c490c8bd17b353c9a9b6e5c42e525a00919: cache:0KB rss:2688KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Oct 16 01:23:22 uos-PC kernel: [4309380.849995] Memory cgroup stats for /kubepods/podac0a8ef5-57c2-49b5-ae3d-31725fc1766d/9930d9af577bb27e537b09d9f2a1557765d96a6cccc3d23e99981a7cb4eb9d43: cache:0KB rss:2752KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Oct 16 01:23:22 uos-PC kernel: [4309380.850000] Memory cgroup stats for /kubepods/podac0a8ef5-57c2-49b5-ae3d-31725fc1766d/82f6a46887e925cbff7b5faacf5eeecd9f09bf496259d3b205e68ed4f89677de: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Oct 16 01:23:22 uos-PC kernel: [4309380.850004] Memory cgroup stats for /kubepods/podac0a8ef5-57c2-49b5-ae3d-31725fc1766d/34498353241a2400335818271c4ecbaf222e05269f4966424ed57f292e9fca72: cache:1152KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Oct 16 01:23:22 uos-PC kernel: [4309380.850009] Memory cgroup stats for /kubepods/podac0a8ef5-57c2-49b5-ae3d-31725fc1766d/1330a72159938ec94d06146d760169089b67f63b9846ef15bdbbe8101a4711c8: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
1、从上面日志可以看到进程是promtail触发了oom。但是这些日志经过分析不是导致宕机的原因。是pod内存溢出从而oom的现象。
2、除这些日志信息之外并无其他信息,排查串口信息也并未发现有用信息,这里解释一下(有些宕机在几秒或者瞬间宕机来不及给系统发送问题日志或系统日志宕机时一堆乱码,没有办法进行判断).。
3、因是宕机问题,有复现概率所以直接部署kdump获取crash日志。
kdump等日志分析
空指针数据异常,导致oops,调用路径,系统调用execve,然后do_filp_open,到path_openat最后到do_last
[273361.406280] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[273361.426332] Mem abort info:
[273361.433842] ESR = 0x96000005
[273361.441854] Exception class = DABT (current EL), IL = 32 bits
[273361.455846] SET = 0, FnV = 0
[273361.463817] EA = 0, S1PTW = 0
[273361.471969] Data abort info:
[273361.479560] ISV = 0, ISS = 0x00000005
[273361.489117] CM = 0, WnR = 0
[273361.496825] user pgtable: 64k pages, 48-bit VAs, pgdp = 00000000952bbc27
[273361.512344] [0000000000000000] pgd=0000000000000000, pud=0000000000000000
[273361.528030] Internal error: Oops: 96000005 [#1] SMP
[273361.539697] Modules linked in: xt_CT ipt_rpfilter iptable_raw ip_set_hash_ip ip_set_hash_net xt_multiport xt_nat xt_tcpudp veth ip6table_filter nf_conntrack_netlink xt_addrtype xt_set ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_bitmap_port ip_set dummy ip6table_nat ip6_tables ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs ceph libceph dns_resolver fscache nft_chain_route_ipv6 ip6t_MASQUERADE nft_counter nft_compat nft_chain_nat_ipv6 nf_nat_ipv6 nf_tables iptable_mangle ipt_MASQUERADE xt_conntrack xt_comment iptable_filter xt_mark iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter nfnetlink nls_iso8859_1 nls_cp437 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif ipmi_si ipmi_devintf ipmi_msghandler lightapple sunrpc efivarfs ip_tables x_tables
[273361.696816] ipv6 xfs btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear md_mod ast usb_storage firmware_class i40e igb i2c_designware_platform i2c_designware_core hid_generic usbkbd usbmouse usbhid
[273361.750809] Process bash (pid: 205745, stack limit = 0x00000000609e69a1)
[273361.766332] CPU: 61 PID: 205745 Comm: bash Kdump: loaded Not tainted 4.19.0-arm64-server #5015
[273361.785847] Hardware name: Kedacom HyperForce-1000/HyperForce-1000, BIOS 2.6 20210816
[273361.803716] pstate: 20000005 (nzCv daif -PAN -UAO)
[273361.815232] pc : do_last+0x44/0x878
[273361.824004] lr : path_openat+0x60/0x238
[273361.833470] sp : ffff84839d82fb50
[273361.841828] x29: ffff84839d82fb50 x28: ffff8483aff74080
[273361.854350] x27: ffff84800792c6c0 x26: 0000aaac2a66a370
[273361.866857] x25: 0000000000000000 x24: 0000000000000041
[273361.879351] x23: 0000000000000001 x22: ffff86813f007f00
[273361.891898] x21: 0000000000020020 x20: ffff84839d82fd9c
[273361.904385] x19: ffff84839d82fc88 x18: 0000000000000000
[273361.916840] x17: 0000000000000000 x16: 0000000000000000
[273361.929285] x15: 0000000000000000 x14: ffff8483ce9f62a0
[273361.941702] x13: 0000000000000000 x12: 0000000000000000
[273361.954085] x11: 0000000000000000 x10: d0d0d0d0d0d0a3bc
[273361.966520] x9 : 0000000000000000 x8 : ffff8483aff74080
[273361.978854] x7 : e67d9ffff09075fe x6 : ffff8483aff733ec
[273361.991169] x5 : 0000000000000002 x4 : fefefefefeff0003
[273362.003471] x3 : 0000000000000000 x2 : 0000000000000000
[273362.015758] x1 : 0000000000000051 x0 : ffff84839d82fc88
[273362.028055] Call trace:
[273362.034317] do_last+0x44/0x878
[273362.042020] path_openat+0x60/0x238
[273362.050443] do_filp_open+0x60/0xc0
[273362.058821] do_open_execat+0x60/0x1d8
[273362.067711] __do_execve_file.isra.12+0x634/0x7b8
[273362.078629] do_execve+0x2c/0x38
[273362.086441] __arm64_sys_execve+0x28/0x38
[273362.095875] el0_svc_common+0x90/0x160
[273362.104771] el0_svc_handler+0x9c/0xa8
[273362.113659] el0_svc+0x8/0xc
[273362.120721] Code: f9401b82 121702b9 b9403801 b9404403 (79400058)
[273362.134550] SMP: stopping secondary CPUs
[273362.144910] Starting crashdump kernel...
[273362.154883] Bye!
代码实现
fs/namei.c
/*
* Handle the last step of open()
*/
static int do_last(struct nameidata *nd,
struct file *file, const struct open_flags *op)
{
struct dentry *dir = nd->path.dentry;
3262 kuid_t dir_uid = dir->d_inode->i_uid;
umode_t dir_mode = dir->d_inode->i_mode;
int open_flag = op->open_flag;
bool will_truncate = (open_flag & O_TRUNC) != 0;
bool got_write = false;
int acc_mode = op->acc_mode;
unsigned seq;
struct inode *inode;
struct path path;
int error;
nd->flags &= ~LOOKUP_PARENT;
nd->flags |= op->intent;
if (nd->last_type != LAST_NORM) {
error = handle_dots(nd, nd->last_type);
if (unlikely(error))
return error;
goto finish_open;
...
}
Crash分析
crash> bt
PID: 205745 TASK: ffff84800792c6c0 CPU: 61 COMMAND: "bash"
#0 [ffff84839d82f5c0] machine_kexec at ffff00000809c720
#1 [ffff84839d82f620] __crash_kexec at ffff0000081758fc
#2 [ffff84839d82f790] crash_kexec at ffff0000081759f0
#3 [ffff84839d82f7c0] die at ffff00000808d7f4
#4 [ffff84839d82f800] die_kernel_fault at ffff00000809fee4
#5 [ffff84839d82f830] __do_kernel_fault at ffff00000809ff80
#6 [ffff84839d82f860] do_page_fault at ffff0000080a01e4
#7 [ffff84839d82f950] do_translation_fault at ffff000008a850c4
#8 [ffff84839d82f960] do_mem_abort at ffff000008081250
#9 [ffff84839d82fb40] el1_ia at ffff0000080830cc
PC: ffff0000082ada6c [do_last+68]
LR: ffff0000082ae300 [path_openat+96]
SP: ffff84839d82fb50 PSTATE: 20000005
X29: ffff84839d82fb50 X28: ffff8483aff74080 X27: ffff84800792c6c0
X26: 0000aaac2a66a370 X25: 0000000000000000 X24: 0000000000000041
X23: 0000000000000001 X22: ffff86813f007f00 X21: 0000000000020020
X20: ffff84839d82fd9c X19: ffff84839d82fc88 X18: 0000000000000000
X17: 0000000000000000 X16: 0000000000000000 X15: 0000000000000000
X14: ffff8483ce9f62a0 X13: 0000000000000000 X12: 0000000000000000
X11: 0000000000000000 X10: d0d0d0d0d0d0a3bc X9: 0000000000000000
X8: ffff8483aff74080 X7: e67d9ffff09075fe X6: ffff8483aff733ec
X5: 0000000000000002 X4: fefefefefeff0003 X3: 0000000000000000
X2: 0000000000000000 X1: 0000000000000051 X0: ffff84839d82fc88
#10 [ffff84839d82fb50] do_last at ffff0000082ada68
#11 [ffff84839d82fbf0] path_openat at ffff0000082ae2fc
#12 [ffff84839d82fc50] do_filp_open at ffff0000082af60c
#13 [ffff84839d82fd60] do_open_execat at ffff0000082a3c3c
#14 [ffff84839d82fdb0] __do_execve_file at ffff0000082a5e98
#15 [ffff84839d82fe40] do_execve at ffff0000082a6080
#16 [ffff84839d82fe50] __arm64_sys_execve at ffff0000082a627c
#17 [ffff84839d82fe70] el0_svc_common at ffff0000080951dc
#18 [ffff84839d82feb0] el0_svc_handler at ffff000008095348
#19 [ffff84839d82fff0] el0_svc at ffff000008083f84
PC: 0000fffcc67d5094 LR: 0000aaac2a5585c8 SP: 0000ffffda204c20
X29: 0000ffffda204c20 X28: 0000fffcc66e2570 X27: 0000fffcc66eada0
X26: 0000fffcc66e2590 X25: 0000fffcc68379b0 X24: 0000aaac2a66a370
X23: 0000fffcc66ea910 X22: 00000000ffffffff X21: 0000aaac2a669de0
X20: 0000aaac2a64d000 X19: 0000aaac2a669de0 X18: 0000fffcc6829447
X17: 0000fffcc67d5084 X16: 0000aaac2a64c768 X15: 0000000000000000
X14: 0000000000000001 X13: 0000000000000000 X12: 0000000000000000
X11: 00000000ffffffff X10: 0000000000000000 X9: 00000000fffffffe
X8: 00000000000000dd X7: 000000000000000c X6: 0000000000000000
X5: 0000fffcc66ea92c X4: 0000000000000000 X3: 0000fffcc683a450
X2: 0000aaac2a66a370 X1: 0000fffcc66ea910 X0: 0000aaac2a669de0
ORIG_X0: 0000aaac2a669de0 SYSCALLNO: dd PSTATE: 60000000
crash> dis -l path_openat+96
/home/uos/jenkins/workspace/iso-server-SP4/server-kernel-pipeline/arm-kernel/fs/namei.c: 3538
0xffff0000082ae300 <path_openat+96>: mov w19, w0
crash> dis -l do_last+68
/home/uos/jenkins/workspace/iso-server-SP4/server-kernel-pipeline/arm-kernel/fs/namei.c: 3262
0xffff0000082ada6c <do_last+68>: ldrh w24, [x2]
crash> dis do_last
0xffff0000082ada28 <do_last>: stp x29, x30, [sp,#-160]!
0xffff0000082ada2c <do_last+4>: mov x29, sp
0xffff0000082ada30 <do_last+8>: stp x19, x20, [sp,#16]------将目前的X19/X20存储堆栈
0xffff0000082ada34 <do_last+12>: mov x20, x2-------第三个参数
0xffff0000082ada38 <do_last+16>: mov x19, x0-------第一个参数
0xffff0000082ada3c <do_last+20>: stp x21, x22, [sp,#32]----X21、X22存储在栈中
0xffff0000082ada40 <do_last+24>: mov x22, x1-------第二个参数
0xffff0000082ada44 <do_last+28>: stp x23, x24, [sp,#48]-------X23、X24
0xffff0000082ada48 <do_last+32>: stp x25, x26, [sp,#64]
0xffff0000082ada4c <do_last+36>: stp x27, x28, [sp,#80]
0xffff0000082ada50 <do_last+40>: ldr x28, [x0,#8]-------X0加上8的取值,得到
nameidata-->path->dentry
0xffff0000082ada54 <do_last+44>: ldr w21, [x2]-------op.open_flag
0xffff0000082ada58 <do_last+48>: ldr w23, [x2,#8]-----op.acc_mode
0xffff0000082ada5c <do_last+52>: ldr x2, [x28,#48]-----dentry->d_inode
0xffff0000082ada60 <do_last+56>: and w25, w21, #0x200
0xffff0000082ada64 <do_last+60>: ldr w1, [x0,#56] nameidata->flags
0xffff0000082ada68 <do_last+64>: ldr w3, [x0,#68] nameidata->last_type
0xffff0000082ada6c <do_last+68>: ldrh w24, [x2]-------------异常
0xffff0000082ada70 <do_last+72>: and w1, w1, #0xffffffef
0xffff0000082ada74 <do_last+76>: ldr w26, [x2,#4]
0xffff0000082ada78 <do_last+80>: str w1, [x0,#56]
0xffff0000082ada7c <do_last+84>: ldr w2, [x20,#12]
0xffff0000082ada80 <do_last+88>: orr w1, w1, w2
0xffff0000082ada84 <do_last+92>: str w1, [x0,#56]
0xffff0000082ada88 <do_last+96>: cbz w3, 0xffff0000082adaf8 <do_last+208>
异常之前X19、X20未发生改变
crash> struct -o nameidata
struct nameidata {
[0] struct path path;
[16] struct qstr last;
[32] struct path root;
[48] struct inode *inode;
[56] unsigned int flags;
[60] unsigned int seq;
[64] unsigned int m_seq;
[68] int last_type;
[72] unsigned int depth;
[76] int total_link_count;
[80] struct saved *stack;
[88] struct saved internal[2];
[184] struct filename *name;
[192] struct nameidata *saved;
[200] struct inode *link_inode;
[208] unsigned int root_seq;
[212] int dfd;
}
SIZE: 216
X19保存的是第一个参数,目前的值为ffff84839d82fc88
crash> struct -x nameidata ffff84839d82fc88
struct nameidata {
path = {
mnt = 0xffff82839cd2b720,
dentry = 0xffff8483aff74080
},
last = {
{
{
hash = 0x1fdf2a43,
len = 0x2
},
hash_len = 0x21fdf2a43
},
name = 0xffff8683c63e8025 "ls"
},
root = {
mnt = 0xffff82839cd2b720,
dentry = 0xffff8683b0c469c0----------挂载点为/
},
inode = 0xffff8483a2e6b430,
flags = 0x51,
seq = 0x2,
m_seq = 0x1925b80,
last_type = 0x0,
depth = 0x0,
total_link_count = 0x0,
stack = 0xffff84839d82fce0,
internal = {{
link = {
mnt = 0xffff84839d82fd10,
dentry = 0xffff0000083f5674 <security_prepare_creds+60>
},
done = {
fn = 0xffff00000907be08 <apparmor_hooks+2000>,
arg = 0x6000c0
},
name = 0xffff848055deea00 "\003",
seq = 0x80fb00c
}, {
link = {
mnt = 0xffff84839d82fd40,
dentry = 0xffff0000080fb0b8 <prepare_creds+208>
},
done = {
fn = 0xffff8681154e0f00,
arg = 0xffff848055deea00
},
name = 0xffff868121395500 "",
seq = 0x0
}},
name = 0xffff8683c63e8000,
saved = 0x0,
link_inode = 0x0,
root_seq = 0x2,
dfd = 0xffffff9c
}
crash> struct -x nameidata.path.dentry ffff84839d82fc88
path.dentry = 0xffff8483aff74080
等于X28的值
crash> struct -x dentry.d_inode 0xffff8483aff74080
d_inode = 0x0
尝试内核社区的针对该文件的提交记录,分析和目前得到空的inode节点是否存在关联
疑似关联补丁说明如下:
commit 06008ebcd5b21ad4b1fc2ceaaa40fb263f26666a
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Thu Feb 20 15:19:32 2020 +0800
vfs: fix do_last() regression(退化)
commit 6404674acd596de41fd3ad5f267b4525494a891a upstream.
Brown paperbag(纸袋) time: fetching ->i_uid/->i_mode really should've been
done from nd->inode. I even suggested that, but the reason for that has
slipped(滑出) through the cracks and I went(走、移动) for dir->d_inode instead - made
for more "obvious" patch.
Analysis:
- at the entry into do_last() and all the way to step_into()(同一个文件中定义的函数): dir (aka
nd->path.dentry) is known not to have been freed; so's nd->inode and
it's equal to dir->d_inode unless we are already doomed(注定) to -ECHILD.
inode of the file to get opened is not known.
常规来说,两者是相等的,除非已经注定为ECHILD,文件的节点被打开是不被知道的。
- after step_into(): inode of the file to get opened is known; dir
might be pointing to freed memory/be negative/etc.
调用完step_into之后,文件的inode被打开是已知道,目录可能会指向已释放的内存,为负值等
- at the call of may_create_in_sticky(): guaranteed(保证) to be out of RCU
mode; inode of the file to get opened is known and pinned; dir might
be garbage.(垃圾)
The last was the reason for the original patch. Except that at the
do_last() entry we can be in RCU mode and it is possible that
nd->path.dentry->d_inode has already changed under us.(do_last可能在RCU模式下,因此可能导致dir在我们底层被改变)
In that case we are going to fail with -ECHILD, but we need to be
careful; nd->inode is pointing to valid struct inode and it's the same
as nd->path.dentry->d_inode in "won't fail with -ECHILD" case, so we
should use that.(nd->inode指向合法的inode结构体,和dir是相同的,不会导致ECHILD错误)
Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
Reported-by: syzbot+190005201ced78a74ad6@syzkaller.appspotmail.com
Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Fixes: d0cb50185ae9 ("do_last(): fetch directory ->i_mode and ->i_uid before it's too late")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
diff --git a/fs/namei.c b/fs/namei.c
index 1dd68b3a209e..18ddae1365c7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3266,8 +3266,8 @@ static int do_last(struct nameidata *nd,
struct file *file, const struct open_flags *op)
{
struct dentry *dir = nd->path.dentry;
- kuid_t dir_uid = dir->d_inode->i_uid;//存在dir->d_inode为0的情况导致NULL指针访问
- umode_t dir_mode = dir->d_inode->i_mode;
+ kuid_t dir_uid = nd->inode->i_uid;
+ umode_t dir_mode = nd->inode->i_mode;
int open_flag = op->open_flag;
bool will_truncate = (open_flag & O_TRUNC) != 0;
bool got_write = false;
用crash验证,修改后是否依旧为空
crash> struct -x nameidata.inode ffff84839d82fc88
inode = 0xffff8483a2e6b430
crash> struct inode.i_uid 0xffff8483a2e6b430
i_uid = {
val = 0
}
确实不再为空。
即采用补丁后,inode节点值为空问题解决。
初步排除的原因是,和入了CVE-2020-8428
"do_last(): fetch directory ->i_mode and ->i_uid before it’s too late"的补丁,但存在一个设计缺陷,在之后上游和入了补丁
vfs: fix do_last() regression
用来修复CVE补丁的设计缺陷。
经过补丁修复后进行高压测试并未出现宕机现象,问题解决。