【Linux】分析hung_panic生成的vmcore

简介

1、遇到一个问题:
上述日志是oom_kill,下述日志是hung_panic
2、分别解释两层含义,全部日志如下:

[75834.243209] kodo invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), order=0, oom_score_adj=968
[75834.245657] CPU: 0 PID: 23476 Comm: kodo Kdump: loaded Tainted: G           OE     4.19.90-2305.1.0.019
9.78.uel20.x86_64 #1
[75834.248210] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.11.0-2.el7 04/01/2014
[75834.250623] Call Trace:
[75834.252090]  dump_stack+0x66/0x8b
[75834.253680]  dump_header+0x4a/0x1ec
[75834.255234]  oom_kill_process+0x24f/0x270
[75834.257018]  out_of_memory+0x141/0x570
[75834.259117]  mem_cgroup_out_of_memory+0xb5/0xd0
[75834.260763]  try_charge+0x723/0x770
[75834.262496]  ? mem_cgroup_commit_charge+0x7f/0x4e0
[75834.264713]  mem_cgroup_try_charge+0x86/0x180
[75834.266306]  __add_to_page_cache_locked+0x60/0x290
[75834.268318]  add_to_page_cache_lru+0x4a/0xf0
[75834.270041]  iomap_readpages_actor+0x129/0x2a0
[75834.271760]  ? iomap_dio_bio_end_io+0x190/0x190
[75834.273816]  iomap_apply+0xba/0x160
[75834.275765]  ? iomap_dio_bio_end_io+0x190/0x190
[75834.277348]  iomap_readpages+0xaa/0x1e0
[75834.279000]  ? iomap_dio_bio_end_io+0x190/0x190
[75834.280679]  read_pages+0x6d/0x1d0
[75834.282123]  ? __do_page_cache_readahead+0x16c/0x1d0
[75834.283745]  __do_page_cache_readahead+0x16c/0x1d0
[75834.285347]  filemap_fault+0x298/0x8a0
[75834.286755]  ? kmem_cache_free+0x180/0x1b0
[75834.288988]  __xfs_filemap_fault+0x72/0x200 [xfs]
[75834.290618]  __do_fault+0x33/0x110
[75834.291988]  do_fault+0x12e/0x490
[75834.293451]  __handle_mm_fault+0x613/0x690
[75834.295491]  handle_mm_fault+0xc4/0x200
[75834.296884]  __do_page_fault+0x240/0x4c0
[75834.298539]  do_page_fault+0x31/0x130
[75834.300068]  ? async_page_fault+0x8/0x30
[75834.301720]  async_page_fault+0x1e/0x30
[75834.303468] memory: usage 12582792kB, limit 12582912kB, failcnt 317157
[75834.305486] memory+swap: usage 12582792kB, limit 9007199254740988kB, failcnt 0
[75834.308073] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[75834.310515] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a: cache:
0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB acti
ve_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[75834.317024] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a/5feef66
2206c588f4751444e30c4257c1dfe6f62bec8d5c20bec457186b70fe7: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped
_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file
:0KB unevictable:0KB
[75834.324632] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a/4e74f07
4587671f5e770d3f8071c630a70ede73ee423d59a6dd49149c3a6c734: cache:17524KB rss:12562956KB rss_huge:6912000KB
 shmem:0KB mapped_file:1188KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:12562956KB in
active_file:16140KB active_file:12KB unevictable:0KB
[75834.333179] Tasks state (memory values in pages):
[75834.335680] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[75834.338171] [  22697]     0 22697      256        1    32768        0          -998 pause
[75834.340836] [  23362]     0 23362  3470438  3140655 25550848        0           968 kodo
[75834.343473] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=4e74f074587671f5e770d3f8071c630
a70ede73ee423d59a6dd49149c3a6c734,mems_allowed=0,oom_memcg=/kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-
7cd399c77b7a,task_memcg=/kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a/4e74f074587671f5e770d3
f8071c630a70ede73ee423d59a6dd49149c3a6c734,task=kodo,pid=23362,uid=0
[75834.354192] Memory cgroup out of memory: Kill process 23362 (kodo) score 1968 or sacrifice child
[75834.357745] Killed process 23362 (kodo) total-vm:13881752kB, anon-rss:12562620kB, file-rss:0kB, shmem-r
ss:0kB
[75834.736239] oom_reaper: reaped process 23362 (kodo), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[76349.203912] TCP: request_sock_TCP: Possible SYN flooding on port 9527. Sending cookies.  Check SNMP cou
nters.
[85988.503793] INFO: task kodo:2939685 blocked for more than 1200 seconds.
[85988.506238]       Tainted: G           OE     4.19.90-2305.1.0.0199.78.uel20.x86_64 #1
[85988.508710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[85988.512771] kodo            D    0 2939685 2939616 0x00000080
[85988.515238] Call Trace:
[85988.517192]  ? __schedule+0x286/0x740
[85988.517199]  schedule+0x29/0xc0
[85988.521494]  schedule_preempt_disabled+0xa/0x10
[85988.523722]  __mutex_lock.isra.7+0x20b/0x470
[85988.525780]  ? fuse_lock_inode+0x27/0x30 [fuse]
[85988.527911]  fuse_lock_inode+0x27/0x30 [fuse]
[85988.529928]  fuse_lookup+0x46/0x140 [fuse]
[85988.531907]  ? d_alloc_parallel+0x95/0x4d0
[85988.533942]  __lookup_slow+0x97/0x150
[85988.536004]  lookup_slow+0x35/0x50
[85988.537910]  walk_component+0x1c4/0x340
[85988.539882]  ? fuse_permission+0x30/0x150 [fuse]
[85988.541908]  link_path_walk.part.33+0x2a6/0x510
[85988.544042]  ? path_init+0x192/0x320
[85988.545916]  path_lookupat+0x95/0x210
[85988.547837]  filename_lookup+0xb6/0x190
[85988.549753]  ? audit_alloc_name+0x7e/0xd0
[85988.551710]  ? path_get+0x11/0x30
[85988.553669]  ? __audit_getname+0x9f/0xb0
[85988.555655]  ? getname_flags+0xb9/0x1e0
[85988.557672]  ? vfs_statx+0x73/0xe0
[85988.559591]  vfs_statx+0x73/0xe0
[85988.561361]  __do_sys_newfstatat+0x31/0x70
[85988.563200]  ? syscall_trace_enter+0x1df/0x2e0
[85988.565182]  ? __audit_syscall_exit+0x238/0x2c0
[85988.567047]  do_syscall_64+0x5f/0x240
[85988.568865]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[85988.571261] INFO: task kodo:2939695 blocked for more than 1200 seconds.
[85988.573951]       Tainted: G           OE     4.19.90-2305.1.0.0199.78.uel20.x86_64 #1
[85988.576253] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[85988.578441] kodo            D    0 2939695 2939616 0x00000080
[85988.580330] Call Trace:
[85988.581734]  ? __schedule+0x286/0x740
[85988.583394]  schedule+0x29/0xc0
[85988.584843]  schedule_preempt_disabled+0xa/0x10
[85988.586632]  __mutex_lock.isra.7+0x20b/0x470
[85988.588191]  ? fuse_lock_inode+0x27/0x30 [fuse]
[85988.589818]  fuse_lock_inode+0x27/0x30 [fuse]
[85988.591278]  fuse_lookup+0x46/0x140 [fuse]
[85988.592731]  ? d_alloc_parallel+0x95/0x4d0
[85988.594174]  __lookup_slow+0x97/0x150
[85988.595469]  lookup_slow+0x35/0x50
[85988.596873]  walk_component+0x1c4/0x340
[85988.598236]  ? fuse_permission+0x30/0x150 [fuse]
[85988.599717]  link_path_walk.part.33+0x2a6/0x510
[85988.601101]  ? path_init+0x192/0x320
[85988.602401]  path_lookupat+0x95/0x210
[85988.603898]  filename_lookup+0xb6/0x190
[85988.605247]  ? audit_alloc_name+0x7e/0xd0
[85988.606482]  ? path_get+0x11/0x30
[85988.607660]  ? __audit_getname+0x9f/0xb0
[85988.609270]  ? getname_flags+0xb9/0x1e0
[85988.610547]  ? vfs_statx+0x73/0xe0
[85988.611757]  vfs_statx+0x73/0xe0
[85988.612875]  __do_sys_newfstatat+0x31/0x70
[85988.615046]  ? syscall_trace_enter+0x1df/0x2e0
[85988.616437]  ? __audit_syscall_exit+0x238/0x2c0
[85988.617825]  do_syscall_64+0x5f/0x240
[85988.619091]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[85988.620778] Kernel panic - not syncing: hung_task: blocked tasks
[85988.622425] CPU: 15 PID: 175 Comm: khungtaskd Kdump: loaded Tainted: G           OE     4.19.90-2305.1.
0.0199.78.uel20.x86_64 #1
[85988.625743] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.11.0-2.el7 04/01/2014
[85988.627659] Call Trace:
[85988.628806]  dump_stack+0x66/0x8b
[85988.630119]  panic+0x106/0x2b6
[85988.631539]  watchdog+0x270/0x400
[85988.632777]  ? hungtask_pm_notify+0x40/0x40
[85988.634134]  kthread+0x113/0x130
[85988.635459]  ? kthread_create_worker_on_cpu+0x70/0x70
[85988.636981]  ret_from_fork+0x35/0x40

oom-kill内容分析

截取日志如下:

[75834.243209] kodo invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), order=0, oom_score_adj=968
[75834.245657] CPU: 0 PID: 23476 Comm: kodo Kdump: loaded Tainted: G           OE     4.19.90-2305.1.0.019
9.78.uel20.x86_64 #1
[75834.248210] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.11.0-2.el7 04/01/2014
[75834.250623] Call Trace:
[75834.252090]  dump_stack+0x66/0x8b
[75834.253680]  dump_header+0x4a/0x1ec
[75834.255234]  oom_kill_process+0x24f/0x270
[75834.257018]  out_of_memory+0x141/0x570
[75834.259117]  mem_cgroup_out_of_memory+0xb5/0xd0
[75834.260763]  try_charge+0x723/0x770
[75834.262496]  ? mem_cgroup_commit_charge+0x7f/0x4e0
[75834.264713]  mem_cgroup_try_charge+0x86/0x180
[75834.266306]  __add_to_page_cache_locked+0x60/0x290
[75834.268318]  add_to_page_cache_lru+0x4a/0xf0
[75834.270041]  iomap_readpages_actor+0x129/0x2a0
[75834.271760]  ? iomap_dio_bio_end_io+0x190/0x190
[75834.273816]  iomap_apply+0xba/0x160
[75834.275765]  ? iomap_dio_bio_end_io+0x190/0x190
[75834.277348]  iomap_readpages+0xaa/0x1e0
[75834.279000]  ? iomap_dio_bio_end_io+0x190/0x190
[75834.280679]  read_pages+0x6d/0x1d0
[75834.282123]  ? __do_page_cache_readahead+0x16c/0x1d0
[75834.283745]  __do_page_cache_readahead+0x16c/0x1d0
[75834.285347]  filemap_fault+0x298/0x8a0
[75834.286755]  ? kmem_cache_free+0x180/0x1b0
[75834.288988]  __xfs_filemap_fault+0x72/0x200 [xfs]
[75834.290618]  __do_fault+0x33/0x110
[75834.291988]  do_fault+0x12e/0x490
[75834.293451]  __handle_mm_fault+0x613/0x690
[75834.295491]  handle_mm_fault+0xc4/0x200
[75834.296884]  __do_page_fault+0x240/0x4c0
[75834.298539]  do_page_fault+0x31/0x130
[75834.300068]  ? async_page_fault+0x8/0x30
[75834.301720]  async_page_fault+0x1e/0x30
[75834.303468] memory: usage 12582792kB, limit 12582912kB, failcnt 317157
[75834.305486] memory+swap: usage 12582792kB, limit 9007199254740988kB, failcnt 0
[75834.308073] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[75834.310515] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a: cache:
0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB acti
ve_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[75834.317024] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a/5feef66
2206c588f4751444e30c4257c1dfe6f62bec8d5c20bec457186b70fe7: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped
_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file
:0KB unevictable:0KB
[75834.324632] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a/4e74f07
4587671f5e770d3f8071c630a70ede73ee423d59a6dd49149c3a6c734: cache:17524KB rss:12562956KB rss_huge:6912000KB
 shmem:0KB mapped_file:1188KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:12562956KB in
active_file:16140KB active_file:12KB unevictable:0KB

第一段,因系统内存不足,kodo进程触发了oom-killer

[75834.243209] kodo invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), order=0, oom_score_adj=968
[75834.245657] CPU: 0 PID: 23476 Comm: kodo Kdump: loaded Tainted: G           OE     4.19.90-2305.1.0.019
9.78.uel20.x86_64 #1

第二段,栈堆是系统遇到了内存不足的问题,内核执行了oom进行回收内存的一个详细说明

[75834.250623] Call Trace:
[75834.252090]  dump_stack+0x66/0x8b
[75834.253680]  dump_header+0x4a/0x1ec
[75834.255234]  oom_kill_process+0x24f/0x270
[75834.257018]  out_of_memory+0x141/0x570
[75834.259117]  mem_cgroup_out_of_memory+0xb5/0xd0
[75834.260763]  try_charge+0x723/0x770
[75834.262496]  ? mem_cgroup_commit_charge+0x7f/0x4e0
[75834.264713]  mem_cgroup_try_charge+0x86/0x180
[75834.266306]  __add_to_page_cache_locked+0x60/0x290
[75834.268318]  add_to_page_cache_lru+0x4a/0xf0
[75834.270041]  iomap_readpages_actor+0x129/0x2a0
[75834.271760]  ? iomap_dio_bio_end_io+0x190/0x190
[75834.273816]  iomap_apply+0xba/0x160
[75834.275765]  ? iomap_dio_bio_end_io+0x190/0x190
[75834.277348]  iomap_readpages+0xaa/0x1e0
[75834.279000]  ? iomap_dio_bio_end_io+0x190/0x190
[75834.280679]  read_pages+0x6d/0x1d0
[75834.282123]  ? __do_page_cache_readahead+0x16c/0x1d0
[75834.283745]  __do_page_cache_readahead+0x16c/0x1d0
[75834.285347]  filemap_fault+0x298/0x8a0
[75834.286755]  ? kmem_cache_free+0x180/0x1b0
[75834.288988]  __xfs_filemap_fault+0x72/0x200 [xfs]
[75834.290618]  __do_fault+0x33/0x110
[75834.291988]  do_fault+0x12e/0x490
[75834.293451]  __handle_mm_fault+0x613/0x690
[75834.295491]  handle_mm_fault+0xc4/0x200
[75834.296884]  __do_page_fault+0x240/0x4c0
[75834.298539]  do_page_fault+0x31/0x130
[75834.300068]  ? async_page_fault+0x8/0x30
[75834.301720]  async_page_fault+0x1e/0x30

第三段意思,内存限制为12G,当前内存使用了12G,由于内存不足有30万次分配失败

内存:
usage 12582792kB: 当前内存使用量为 12,582,792 KB。
limit 12582912kB: 内存限制为12,582,912 KB。
failcnt 317157: 表示由于内存不足,发生了 317,157 次分配失败。

交换内存:
usage 12582792kB: 当前内存加交换空间的使用量。
limit 9007199254740988kB:交换空间的限制非常大,接近无限制。
failcnt 0: 目前没有因交换空间不足而导致的失败。

内核内存:
usage 0kB: 内核内存使用为 0 KB。
limit 9007199254740988kB: 内核内存限制非常大。
failcnt 0: 内核内存分配没有失败。

[75834.303468] memory: usage 12582792kB, limit 12582912kB, failcnt 317157
[75834.305486] memory+swap: usage 12582792kB, limit 9007199254740988kB, failcnt 0
[75834.308073] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[75834.310515] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a: cache:
0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB acti
ve_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[75834.317024] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a/5feef66
2206c588f4751444e30c4257c1dfe6f62bec8d5c20bec457186b70fe7: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped
_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file
:0KB unevictable:0KB
[75834.324632] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a/4e74f07
4587671f5e770d3f8071c630a70ede73ee423d59a6dd49149c3a6c734: cache:17524KB rss:12562956KB rss_huge:6912000KB
 shmem:0KB mapped_file:1188KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:12562956KB in
active_file:16140KB active_file:12KB unevictable:0KB

总结

1、因k8s的每个pod内存最大使用限制为12G,而pod中的内存使用量已经超过了12G。
2、从上述日志可以表明,由于k8s容器pod内存限制导致分配不足,触发内核oom,而kodo为最优考虑而被杀掉,来保证业务正常运行。

hung_panic内容分析

截取日志如下:

[85988.571261] INFO: task kodo:2939695 blocked for more than 1200 seconds.
[85988.573951]       Tainted: G           OE     4.19.90-2305.1.0.0199.78.uel20.x86_64 #1
[85988.576253] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[85988.578441] kodo            D    0 2939695 2939616 0x00000080
[85988.580330] Call Trace:
[85988.581734]  ? __schedule+0x286/0x740
[85988.583394]  schedule+0x29/0xc0
[85988.584843]  schedule_preempt_disabled+0xa/0x10
[85988.586632]  __mutex_lock.isra.7+0x20b/0x470
[85988.588191]  ? fuse_lock_inode+0x27/0x30 [fuse]
[85988.589818]  fuse_lock_inode+0x27/0x30 [fuse]
[85988.591278]  fuse_lookup+0x46/0x140 [fuse]
[85988.592731]  ? d_alloc_parallel+0x95/0x4d0
[85988.594174]  __lookup_slow+0x97/0x150
[85988.595469]  lookup_slow+0x35/0x50
[85988.596873]  walk_component+0x1c4/0x340
[85988.598236]  ? fuse_permission+0x30/0x150 [fuse]
[85988.599717]  link_path_walk.part.33+0x2a6/0x510
[85988.601101]  ? path_init+0x192/0x320
[85988.602401]  path_lookupat+0x95/0x210
[85988.603898]  filename_lookup+0xb6/0x190
[85988.605247]  ? audit_alloc_name+0x7e/0xd0
[85988.606482]  ? path_get+0x11/0x30
[85988.607660]  ? __audit_getname+0x9f/0xb0
[85988.609270]  ? getname_flags+0xb9/0x1e0
[85988.610547]  ? vfs_statx+0x73/0xe0
[85988.611757]  vfs_statx+0x73/0xe0
[85988.612875]  __do_sys_newfstatat+0x31/0x70
[85988.615046]  ? syscall_trace_enter+0x1df/0x2e0
[85988.616437]  ? __audit_syscall_exit+0x238/0x2c0
[85988.617825]  do_syscall_64+0x5f/0x240
[85988.619091]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[85988.620778] Kernel panic - not syncing: hung_task: blocked tasks
[85988.622425] CPU: 15 PID: 175 Comm: khungtaskd Kdump: loaded Tainted: G           OE     4.19.90-2305.1.
0.0199.78.uel20.x86_64 #1
[85988.625743] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.11.0-2.el7 04/01/2014
[85988.627659] Call Trace:
[85988.628806]  dump_stack+0x66/0x8b
[85988.630119]  panic+0x106/0x2b6
[85988.631539]  watchdog+0x270/0x400
[85988.632777]  ? hungtask_pm_notify+0x40/0x40
[85988.634134]  kthread+0x113/0x130
[85988.635459]  ? kthread_create_worker_on_cpu+0x70/0x70
[85988.636981]  ret_from_fork+0x35/0x40

第一段:
是kodo:2939695进程由于长时间处于阻塞状态而被标记为“挂起任务”并提醒执行:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs"可以忽略挂起任务超时提醒(默认超时1200后提醒)

[85988.571261] INFO: task kodo:2939695 blocked for more than 1200 seconds.
[85988.573951]       Tainted: G           OE     4.19.90-2305.1.0.0199.78.uel20.x86_64 #1
[85988.576253] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[85988.578441] kodo            D    0 2939695 2939616 0x00000080

第二段:
触发了内核panic

[85988.620778] Kernel panic - not syncing: hung_task: blocked tasks
[85988.622425] CPU: 15 PID: 175 Comm: khungtaskd Kdump: loaded Tainted: G           OE     4.19.90-2305.1.
0.0199.78.uel20.x86_64 #1

第三段:
详细解释触发panic的栈堆

[85988.627659] Call Trace:
[85988.628806]  dump_stack+0x66/0x8b
[85988.630119]  panic+0x106/0x2b6
[85988.631539]  watchdog+0x270/0x400
[85988.632777]  ? hungtask_pm_notify+0x40/0x40
[85988.634134]  kthread+0x113/0x130
[85988.635459]  ? kthread_create_worker_on_cpu+0x70/0x70
[85988.636981]  ret_from_fork+0x35/0x40

panic生成vmcore分析

vmcore解开的错误日志:

      KERNEL: vmlinux  [TAINTED]                                       
    DUMPFILE: /root/vmcore  [PARTIAL DUMP]
        CPUS: 32
        DATE: Sat Aug 10 02:05:30 CST 2024
      UPTIME: 23:53:08
LOAD AVERAGE: 36.80, 28.43, 21.99
       TASKS: 2151
    NODENAME: tcs-30-34-22-251
     RELEASE: 4.19.90-2305.1.0.0199.78.uel20.x86_64
     VERSION: #1 SMP Wed Feb 28 12:31:25 CST 2024
     MACHINE: x86_64  (2699 Mhz)
      MEMORY: 64 GB
       PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
         PID: 175
     COMMAND: "khungtaskd"
        TASK: ffff9a2c46e2b000  [THREAD_INFO: ffff9a2c46e2b000]
         CPU: 15
       STATE: TASK_RUNNING (PANIC)

说明:

KERNEL: 内核版本,显示为 [TAINTED] 表示有可能有未签名的模块或其他因素导致内核状态不纯净。
DUMPFILE: 崩溃转储文件的位置,显示为 [PARTIAL DUMP] 表示转储可能不完整。
CPUS: 系统有 32 个 CPU。
UPTIME: 系统运行时间为 23 小时 53 分钟。
LOAD AVERAGE: 系统负载情况,显示平均负载较高,1分,10分,15分。
TASKS: 当前运行的任务数量为 2151。
NODENAME: 主机名。
RELEASE: 内核版本号。
VERSION: 内核构建时间和信息。
MACHINE: 机器架构和主频。
MEMORY: 系统内存为 64 GB。
PANIC: 内核 panic 信息,提示因 hung_task(挂起任务)导致系统无法同步。
PID: 崩溃时的进程 ID 为 175。
COMMAND: 崩溃时正在运行的命令是 khungtaskd,这是处理挂起任务的内核线程。
TASK: 崩溃时的线程信息。
CPU: 崩溃时的 CPU 号为 15。
STATE: 任务状态显示为 TASK_RUNNING(运行中)并处于 panic 状态。

panic的内核栈堆:

PID: 175    TASK: ffff9a2c46e2b000  CPU: 15  COMMAND: "khungtaskd"
 0 [ffff9a303c0b7d18] machine_kexec at ffffffffb6857b0f
 1 [ffff9a303c0b7d70] __crash_kexec at ffffffffb695b981
 2 [ffff9a303c0b7e30] panic at ffffffffb68b0c70
 3 [ffff9a303c0b7eb8] watchdog at ffffffffb698f5e0
 4 [ffff9a303c0b7f10] kthread at ffffffffb68d54e3
 5 [ffff9a303c0b7f50] ret_from_fork at ffffffffb7400245

说明:
进程:175 CPU:15 命令:khungtaskd 触发的panic

总结

1、处理kodo进程超时,任务挂起1200秒并打印到日志提醒
2、由于负载过高,kodo又挂起时间过长,内核khungtaskd进程检测到这一情况,并执行了panic

详细回答

从oom到hung日志都在指向kodo进程,所以基本可以判断是由于此进程导致系统负载过高从而触发了panic。

  • 6
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

小白鸽i

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值