昨天偶尔在跑测试的时候发现uksmd的一个极其罕见出现的bug(在一个很难进入的路径上,很久很久没有碰到类似的问题了),本质上属于一个soft lockup的bug。所谓,soft lockup就是说,这个bug没有让系统彻底死机,但是若干个进程(或者kernel thread)被锁死在了某个状态(一般在内核区域)。很多情况下这个是由于内核锁的使用的问题。本系列新手教程帖子,教大家如何查看内核出错信息。
而本帖主要是教大家如何分析错误现场(当然了,你要结合代码)
我们先来看一下这次错误的dmesg输出:
复制代码
我们可以看到内核提示“task uksmd:81 blocked for more than 120 seconds.”,这告诉我们,uksmd这个进程停止在某处很久了。然后内核打印了这个进程的栈和Call Trace(也就是函数调用嵌套情况),我们重点关注的是Call Trace:。
每个Call Trace的格式是由两部分组成:[<压入栈的地址>] 所在函数名称
其中这个地址是当前函数调用的ret地址,或者换句话说,是这个call指令的下一条(这里看不懂的回去好好复习复习C调用规范和汇编 ),而函数名之前可能有 “?”这样一个符号,这个符号告诉我们,这个不是严格的调用框架上的节点,而是栈里面残存的(没有被后来调用的栈框架洗刷掉)一些调用轨迹。这个往往是说明到目前位置,曾经调用过的一些函数路径。
好,我们看一下着重关注的uksmd的情况
复制代码
从上面的轨迹,我们不难看到,这次锁死是出现在一个page lock上,想象一下,这样的情形可能的情况:
为了进一步缩小范围,我们接下来分析日志其它部分:
我们发现同时锁死的有khugepaged, 和 zeusmp_base.amd,后面这个进程是我们benchmark进程,在用户层,所以一般是无辜的。而khugepaged同样是一个页表操纵进程(将碎页合并成大页面),所以首先怀疑是它们两者的dead lock,我们接着看khugepaged的call trace:
复制代码
很明显,锁在一个信号量(semaphore)上,那么这个具体是哪个呢?其实也很明显:mmap_sem是最大的可能性。因为在合并大页面之前,mmap_sem是要加锁,这个很容易想象(其实具体在collapse_huge_page函数里面),而uksmd显然在合并页面的时候也加了这个锁,所以,不难看出khugepaged其实锁在 uksmd正在工作的mmap_sem上。这个根本原因还是uksmd死锁导致的,而不是它们之间的互相 dead lock。
接下来看zeusmp_base.amd被锁死的场景,也是同一个:mmap_sem,所以不难的出这个现场的基本猜测:
uksmd自己有某个页面没有释放锁,而下一次扫描的时候,堵在了自己加的page lock上,而由于它被锁死,没有释放mmap_sem,所以khugepaged在扫描zeusmp_base.amd内存区域的时候被锁死了,而接着zeusmp_base.amd由于page fault同样要mmap_sem加读锁,被堵在khugepaged的写锁操作上。
而本帖主要是教大家如何分析错误现场(当然了,你要结合代码)
我们先来看一下这次错误的dmesg输出:
- Mar 28 00:12:31 localhost kernel: [ 4440.558031] INFO: task uksmd:81 blocked for more than 120 seconds.
- Mar 28 00:12:31 localhost kernel: [ 4440.558043] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
- Mar 28 00:12:31 localhost kernel: [ 4440.558053] uksmd D ffff88011b60a0c0 0 81 2 0 0x00000000
- Mar 28 00:12:31 localhost kernel: [ 4440.558060] ffff88011b62fc80 0000000000000046 0000000000000000 ffffea0001eafd40
- Mar 28 00:12:31 localhost kernel: [ 4440.558066] ffff88011b62fc10 80000000a4a2a025 ffff88010a0ee560 00000001003b892f
- Mar 28 00:12:31 localhost kernel: [ 4440.558071] ffff88011b60a6a0 ffff88011b62ffd8 000000000000f788 ffff88011b60a6a0
- Mar 28 00:12:31 localhost kernel: [ 4440.558076] Call Trace:
- Mar 28 00:12:31 localhost kernel: [ 4440.558086] [<ffffffff8111ec90>] ? sync_page+0x0/0x50
- Mar 28 00:12:31 localhost kernel: [ 4440.558092] [<ffffffff814cfdb3>] io_schedule+0xa3/0x110
- Mar 28 00:12:31 localhost kernel: [ 4440.558096] [<ffffffff8111eccd>] sync_page+0x3d/0x50
- Mar 28 00:12:31 localhost kernel: [ 4440.558100] [<ffffffff814d04ea>] __wait_on_bit_lock+0x5a/0xc0
- Mar 28 00:12:31 localhost kernel: [ 4440.558105] [<ffffffff8111ec67>] __lock_page+0x67/0x70
- Mar 28 00:12:31 localhost kernel: [ 4440.558110] [<ffffffff810932a0>] ? wake_bit_function+0x0/0x50
- Mar 28 00:12:31 localhost kernel: [ 4440.558116] [<ffffffff81177102>] cmp_and_merge_page+0x882/0x1a00
- Mar 28 00:12:31 localhost kernel: [ 4440.558121] [<ffffffff81175ca2>] ? get_next_rmap_item+0x1d2/0xdb0
- Mar 28 00:12:31 localhost kernel: [ 4440.558125] [<ffffffff81178ca6>] ksm_scan_thread+0xa26/0x1120
- Mar 28 00:12:31 localhost kernel: [ 4440.558139] [<ffffffff81178280>] ? ksm_scan_thread+0x0/0x1120
- Mar 28 00:12:31 localhost kernel: [ 4440.558144] [<ffffffff81092c86>] kthread+0x96/0xa0
- Mar 28 00:12:31 localhost kernel: [ 4440.558150] [<ffffffff8100c2ca>] child_rip+0xa/0x20
- Mar 28 00:12:31 localhost kernel: [ 4440.558155] [<ffffffff81092bf0>] ? kthread+0x0/0xa0
- Mar 28 00:12:31 localhost kernel: [ 4440.558159] [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
- Mar 28 00:14:31 localhost kernel: [ 4560.558036] INFO: task uksmd:81 blocked for more than 120 seconds.
- Mar 28 00:14:31 localhost kernel: [ 4560.558047] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
- Mar 28 00:14:31 localhost kernel: [ 4560.558058] uksmd D ffff88011b60a0c0 0 81 2 0 0x00000000
- Mar 28 00:14:31 localhost kernel: [ 4560.558064] ffff88011b62fc80 0000000000000046 0000000000000000 ffffea0001eafd40
- Mar 28 00:14:31 localhost kernel: [ 4560.558069] ffff88011b62fc10 80000000a4a2a025 ffff88010a0ee560 00000001003b892f
- Mar 28 00:14:31 localhost kernel: [ 4560.558074] ffff88011b60a6a0 ffff88011b62ffd8 000000000000f788 ffff88011b60a6a0
- Mar 28 00:14:31 localhost kernel: [ 4560.558079] Call Trace:
- Mar 28 00:14:31 localhost kernel: [ 4560.558087] [<ffffffff8111ec90>] ? sync_page+0x0/0x50
- Mar 28 00:14:31 localhost kernel: [ 4560.558092] [<ffffffff814cfdb3>] io_schedule+0xa3/0x110
- Mar 28 00:14:31 localhost kernel: [ 4560.558096] [<ffffffff8111eccd>] sync_page+0x3d/0x50
- Mar 28 00:14:31 localhost kernel: [ 4560.558100] [<ffffffff814d04ea>] __wait_on_bit_lock+0x5a/0xc0
- Mar 28 00:14:31 localhost kernel: [ 4560.558105] [<ffffffff8111ec67>] __lock_page+0x67/0x70
- Mar 28 00:14:31 localhost kernel: [ 4560.558110] [<ffffffff810932a0>] ? wake_bit_function+0x0/0x50
- Mar 28 00:14:31 localhost kernel: [ 4560.558115] [<ffffffff81177102>] cmp_and_merge_page+0x882/0x1a00
- Mar 28 00:14:31 localhost kernel: [ 4560.558120] [<ffffffff81175ca2>] ? get_next_rmap_item+0x1d2/0xdb0
- Mar 28 00:14:31 localhost kernel: [ 4560.558124] [<ffffffff81178ca6>] ksm_scan_thread+0xa26/0x1120
- Mar 28 00:14:31 localhost kernel: [ 4560.558138] [<ffffffff81178280>] ? ksm_scan_thread+0x0/0x1120
- Mar 28 00:14:31 localhost kernel: [ 4560.558143] [<ffffffff81092c86>] kthread+0x96/0xa0
- Mar 28 00:14:31 localhost kernel: [ 4560.558148] [<ffffffff8100c2ca>] child_rip+0xa/0x20
- Mar 28 00:14:31 localhost kernel: [ 4560.558153] [<ffffffff81092bf0>] ? kthread+0x0/0xa0
- Mar 28 00:14:31 localhost kernel: [ 4560.558157] [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
- Mar 28 00:16:31 localhost kernel: [ 4680.558032] INFO: task khugepaged:60 blocked for more than 120 seconds.
- Mar 28 00:16:31 localhost kernel: [ 4680.558043] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
- Mar 28 00:16:31 localhost kernel: [ 4680.558053] khugepaged D ffff88011b441300 0 60 2 0 0x00000000
- Mar 28 00:16:31 localhost kernel: [ 4680.558059] ffff88011b443c90 0000000000000046 0000000000000000 0000000000000030
- Mar 28 00:16:31 localhost kernel: [ 4680.558065] 000000000001e440 ffff880000015680 ffffffffa1004f40 000000010040db15
- Mar 28 00:16:31 localhost kernel: [ 4680.558070] ffff88011b4418e0 ffff88011b443fd8 000000000000f788 ffff88011b4418e0
- Mar 28 00:16:31 localhost kernel: [ 4680.558075] Call Trace:
- Mar 28 00:16:31 localhost kernel: [ 4680.558088] [<ffffffffa1004f40>] ? do_dbs_timer+0x0/0x384 [cpufreq_ondemand]
- Mar 28 00:16:31 localhost kernel: [ 4680.558095] [<ffffffff814d1a55>] rwsem_down_failed_common+0x95/0x1d0
- Mar 28 00:16:31 localhost kernel: [ 4680.558099] [<ffffffff814d1bb3>] rwsem_down_write_failed+0x23/0x30
- Mar 28 00:16:31 localhost kernel: [ 4680.558105] [<ffffffff8126a1a3>] call_rwsem_down_write_failed+0x13/0x20
- Mar 28 00:16:31 localhost kernel: [ 4680.558109] [<ffffffff814d10b2>] ? down_write+0x32/0x40
- Mar 28 00:16:31 localhost kernel: [ 4680.558114] [<ffffffff81187605>] khugepaged+0x8f5/0x1370
- Mar 28 00:16:31 localhost kernel: [ 4680.558120] [<ffffffff81093260>] ? autoremove_wake_function+0x0/0x40
- Mar 28 00:16:31 localhost kernel: [ 4680.558124] [<ffffffff81186d10>] ? khugepaged+0x0/0x1370
- Mar 28 00:16:31 localhost kernel: [ 4680.558136] [<ffffffff81092c86>] kthread+0x96/0xa0
- Mar 28 00:16:31 localhost kernel: [ 4680.558141] [<ffffffff8100c2ca>] child_rip+0xa/0x20
- Mar 28 00:16:31 localhost kernel: [ 4680.558145] [<ffffffff81092bf0>] ? kthread+0x0/0xa0
- Mar 28 00:16:31 localhost kernel: [ 4680.558150] [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
- Mar 28 00:16:31 localhost kernel: [ 4680.558154] INFO: task uksmd:81 blocked for more than 120 seconds.
- Mar 28 00:16:31 localhost kernel: [ 4680.558162] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
- Mar 28 00:16:31 localhost kernel: [ 4680.558172] uksmd D ffff88011b60a0c0 0 81 2 0 0x00000000
- Mar 28 00:16:31 localhost kernel: [ 4680.558178] ffff88011b62fc80 0000000000000046 0000000000000000 ffffea0001eafd40
- Mar 28 00:16:31 localhost kernel: [ 4680.558186] ffff88011b62fc10 80000000a4a2a025 ffff88010a0ee560 00000001003b892f
- Mar 28 00:16:31 localhost kernel: [ 4680.558192] ffff88011b60a6a0 ffff88011b62ffd8 000000000000f788 ffff88011b60a6a0
- Mar 28 00:16:31 localhost kernel: [ 4680.558199] Call Trace:
- Mar 28 00:16:31 localhost kernel: [ 4680.558205] [<ffffffff8111ec90>] ? sync_page+0x0/0x50
- Mar 28 00:16:31 localhost kernel: [ 4680.558210] [<ffffffff814cfdb3>] io_schedule+0xa3/0x110
- Mar 28 00:16:31 localhost kernel: [ 4680.558214] [<ffffffff8111eccd>] sync_page+0x3d/0x50
- Mar 28 00:16:31 localhost kernel: [ 4680.558219] [<ffffffff814d04ea>] __wait_on_bit_lock+0x5a/0xc0
- Mar 28 00:16:31 localhost kernel: [ 4680.558224] [<ffffffff8111ec67>] __lock_page+0x67/0x70
- Mar 28 00:16:31 localhost kernel: [ 4680.558229] [<ffffffff810932a0>] ? wake_bit_function+0x0/0x50
- Mar 28 00:16:31 localhost kernel: [ 4680.558234] [<ffffffff81177102>] cmp_and_merge_page+0x882/0x1a00
- Mar 28 00:16:31 localhost kernel: [ 4680.558239] [<ffffffff81175ca2>] ? get_next_rmap_item+0x1d2/0xdb0
- Mar 28 00:16:31 localhost kernel: [ 4680.558244] [<ffffffff81178ca6>] ksm_scan_thread+0xa26/0x1120
- Mar 28 00:16:31 localhost kernel: [ 4680.558249] [<ffffffff81178280>] ? ksm_scan_thread+0x0/0x1120
- Mar 28 00:16:31 localhost kernel: [ 4680.558254] [<ffffffff81092c86>] kthread+0x96/0xa0
- Mar 28 00:16:31 localhost kernel: [ 4680.558258] [<ffffffff8100c2ca>] child_rip+0xa/0x20
- Mar 28 00:16:31 localhost kernel: [ 4680.558263] [<ffffffff81092bf0>] ? kthread+0x0/0xa0
- Mar 28 00:16:31 localhost kernel: [ 4680.558267] [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
- Mar 28 00:18:31 localhost kernel: [ 4800.558031] INFO: task khugepaged:60 blocked for more than 120 seconds.
- Mar 28 00:18:31 localhost kernel: [ 4800.558042] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
- Mar 28 00:18:31 localhost kernel: [ 4800.558177] khugepaged D ffff88011b441300 0 60 2 0 0x00000000
- Mar 28 00:18:31 localhost kernel: [ 4800.558184] ffff88011b443c90 0000000000000046 0000000000000000 0000000000000030
- Mar 28 00:18:31 localhost kernel: [ 4800.558191] 000000000001e440 ffff880000015680 ffffffffa1004f40 000000010040db15
- Mar 28 00:18:31 localhost kernel: [ 4800.558199] ffff88011b4418e0 ffff88011b443fd8 000000000000f788 ffff88011b4418e0
- Mar 28 00:18:31 localhost kernel: [ 4800.558206] Call Trace:
- Mar 28 00:18:31 localhost kernel: [ 4800.558216] [<ffffffffa1004f40>] ? do_dbs_timer+0x0/0x384 [cpufreq_ondemand]
- Mar 28 00:18:31 localhost kernel: [ 4800.558224] [<ffffffff814d1a55>] rwsem_down_failed_common+0x95/0x1d0
- Mar 28 00:18:31 localhost kernel: [ 4800.558229] [<ffffffff814d1bb3>] rwsem_down_write_failed+0x23/0x30
- Mar 28 00:18:31 localhost kernel: [ 4800.558234] [<ffffffff8126a1a3>] call_rwsem_down_write_failed+0x13/0x20
- Mar 28 00:18:31 localhost kernel: [ 4800.558239] [<ffffffff814d10b2>] ? down_write+0x32/0x40
- Mar 28 00:18:31 localhost kernel: [ 4800.558245] [<ffffffff81187605>] khugepaged+0x8f5/0x1370
- Mar 28 00:18:31 localhost kernel: [ 4800.558251] [<ffffffff81093260>] ? autoremove_wake_function+0x0/0x40
- Mar 28 00:18:31 localhost kernel: [ 4800.558256] [<ffffffff81186d10>] ? khugepaged+0x0/0x1370
- Mar 28 00:18:31 localhost kernel: [ 4800.558261] [<ffffffff81092c86>] kthread+0x96/0xa0
- Mar 28 00:18:31 localhost kernel: [ 4800.558266] [<ffffffff8100c2ca>] child_rip+0xa/0x20
- Mar 28 00:18:31 localhost kernel: [ 4800.558270] [<ffffffff81092bf0>] ? kthread+0x0/0xa0
- Mar 28 00:18:31 localhost kernel: [ 4800.558274] [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
- Mar 28 00:18:31 localhost kernel: [ 4800.558279] INFO: task uksmd:81 blocked for more than 120 seconds.
- Mar 28 00:18:31 localhost kernel: [ 4800.558347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
- Mar 28 00:18:31 localhost kernel: [ 4800.558476] uksmd D ffff88011b60a0c0 0 81 2 0 0x00000000
- Mar 28 00:18:31 localhost kernel: [ 4800.558483] ffff88011b62fc80 0000000000000046 0000000000000000 ffffea0001eafd40
- Mar 28 00:18:31 localhost kernel: [ 4800.558490] ffff88011b62fc10 80000000a4a2a025 ffff88010a0ee560 00000001003b892f
- Mar 28 00:18:31 localhost kernel: [ 4800.558497] ffff88011b60a6a0 ffff88011b62ffd8 000000000000f788 ffff88011b60a6a0
- Mar 28 00:18:31 localhost kernel: [ 4800.558504] Call Trace:
- Mar 28 00:18:31 localhost kernel: [ 4800.558509] [<ffffffff8111ec90>] ? sync_page+0x0/0x50
- Mar 28 00:18:31 localhost kernel: [ 4800.558513] [<ffffffff814cfdb3>] io_schedule+0xa3/0x110
- Mar 28 00:18:31 localhost kernel: [ 4800.558518] [<ffffffff8111eccd>] sync_page+0x3d/0x50
- Mar 28 00:18:31 localhost kernel: [ 4800.558523] [<ffffffff814d04ea>] __wait_on_bit_lock+0x5a/0xc0
- Mar 28 00:18:31 localhost kernel: [ 4800.558528] [<ffffffff8111ec67>] __lock_page+0x67/0x70
- Mar 28 00:18:31 localhost kernel: [ 4800.558532] [<ffffffff810932a0>] ? wake_bit_function+0x0/0x50
- Mar 28 00:18:31 localhost kernel: [ 4800.558538] [<ffffffff81177102>] cmp_and_merge_page+0x882/0x1a00
- Mar 28 00:18:31 localhost kernel: [ 4800.558543] [<ffffffff81175ca2>] ? get_next_rmap_item+0x1d2/0xdb0
- Mar 28 00:18:31 localhost kernel: [ 4800.558548] [<ffffffff81178ca6>] ksm_scan_thread+0xa26/0x1120
- Mar 28 00:18:31 localhost kernel: [ 4800.558553] [<ffffffff81178280>] ? ksm_scan_thread+0x0/0x1120
- Mar 28 00:18:31 localhost kernel: [ 4800.558557] [<ffffffff81092c86>] kthread+0x96/0xa0
- Mar 28 00:18:31 localhost kernel: [ 4800.558562] [<ffffffff8100c2ca>] child_rip+0xa/0x20
- Mar 28 00:18:31 localhost kernel: [ 4800.558566] [<ffffffff81092bf0>] ? kthread+0x0/0xa0
- Mar 28 00:18:31 localhost kernel: [ 4800.558570] [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
- Mar 28 00:18:31 localhost kernel: [ 4800.558584] INFO: task zeusmp_base.amd:5957 blocked for more than 120 seconds.
- Mar 28 00:18:31 localhost kernel: [ 4800.558711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
- Mar 28 00:18:31 localhost kernel: [ 4800.558838] zeusmp_base.a D ffff88011a05b0c0 0 5957 5956 0 0x00000080
- Mar 28 00:18:31 localhost kernel: [ 4800.558843] ffff880084875cf0 0000000000000082 ffff880084875cc8 000000000f4775e0
- Mar 28 00:18:31 localhost kernel: [ 4800.558849] ffff8800a955d7b8 ffffea0000000000 ffff8800379d4301 ffff88011add9980
- Mar 28 00:18:31 localhost kernel: [ 4800.558853] ffff88011a05b6a0 ffff880084875fd8 000000000000f788 ffff88011a05b6a0
- Mar 28 00:18:31 localhost kernel: [ 4800.558858] Call Trace:
- Mar 28 00:18:31 localhost kernel: [ 4800.558862] [<ffffffff814d1a55>] rwsem_down_failed_common+0x95/0x1d0
- Mar 28 00:18:31 localhost kernel: [ 4800.558867] [<ffffffff814d1be6>] rwsem_down_read_failed+0x26/0x30
- Mar 28 00:18:31 localhost kernel: [ 4800.558871] [<ffffffff8126a174>] call_rwsem_down_read_failed+0x14/0x30
- Mar 28 00:18:31 localhost kernel: [ 4800.558876] [<ffffffff814d10e4>] ? down_read+0x24/0x30
- Mar 28 00:18:31 localhost kernel: [ 4800.558880] [<ffffffff810414a7>] __do_page_fault+0x187/0x480
- Mar 28 00:18:31 localhost kernel: [ 4800.558885] [<ffffffff810098ee>] ? __switch_to+0x26e/0x320
- Mar 28 00:18:31 localhost kernel: [ 4800.558889] [<ffffffff814cf502>] ? thread_return+0x4e/0x85c
- Mar 28 00:18:31 localhost kernel: [ 4800.558893] [<ffffffff814d4f6e>] do_page_fault+0x3e/0xa0
- Mar 28 00:18:31 localhost kernel: [ 4800.558897] [<ffffffff814d22e5>] page_fault+0x25/0x30
- Mar 28 00:20:31 localhost kernel: [ 4920.558036] INFO: task khugepaged:60 blocked for more than 120 seconds.
- Mar 28 00:20:31 localhost kernel: [ 4920.558108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
- Mar 28 00:20:31 localhost kernel: [ 4920.558243] khugepaged D ffff88011b441300 0 60 2 0 0x00000000
- Mar 28 00:20:31 localhost kernel: [ 4920.558251] ffff88011b443c90 0000000000000046 0000000000000000 0000000000000030
- Mar 28 00:20:31 localhost kernel: [ 4920.558259] 000000000001e440 ffff880000015680 ffffffffa1004f40 000000010040db15
- Mar 28 00:20:31 localhost kernel: [ 4920.558266] ffff88011b4418e0 ffff88011b443fd8 000000000000f788 ffff88011b4418e0
- Mar 28 00:20:31 localhost kernel: [ 4920.558273] Call Trace:
- Mar 28 00:20:31 localhost kernel: [ 4920.558287] [<ffffffffa1004f40>] ? do_dbs_timer+0x0/0x384 [cpufreq_ondemand]
- Mar 28 00:20:31 localhost kernel: [ 4920.558295] [<ffffffff814d1a55>] rwsem_down_failed_common+0x95/0x1d0
- Mar 28 00:20:31 localhost kernel: [ 4920.558300] [<ffffffff814d1bb3>] rwsem_down_write_failed+0x23/0x30
- Mar 28 00:20:31 localhost kernel: [ 4920.558306] [<ffffffff8126a1a3>] call_rwsem_down_write_failed+0x13/0x20
- Mar 28 00:20:31 localhost kernel: [ 4920.558311] [<ffffffff814d10b2>] ? down_write+0x32/0x40
- Mar 28 00:20:31 localhost kernel: [ 4920.558316] [<ffffffff81187605>] khugepaged+0x8f5/0x1370
- Mar 28 00:20:31 localhost kernel: [ 4920.558323] [<ffffffff81093260>] ? autoremove_wake_function+0x0/0x40
- Mar 28 00:20:31 localhost kernel: [ 4920.558328] [<ffffffff81186d10>] ? khugepaged+0x0/0x1370
- Mar 28 00:20:31 localhost kernel: [ 4920.558332] [<ffffffff81092c86>] kthread+0x96/0xa0
- Mar 28 00:20:31 localhost kernel: [ 4920.558337] [<ffffffff8100c2ca>] child_rip+0xa/0x20
- Mar 28 00:20:31 localhost kernel: [ 4920.558342] [<ffffffff81092bf0>] ? kthread+0x0/0xa0
- Mar 28 00:20:31 localhost kernel: [ 4920.558346] [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
- Mar 28 00:20:31 localhost kernel: [ 4920.558351] INFO: task uksmd:81 blocked for more than 120 seconds.
- Mar 28 00:20:31 localhost kernel: [ 4920.558420] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
- Mar 28 00:20:31 localhost kernel: [ 4920.558548] uksmd D ffff88011b60a0c0 0 81 2 0 0x00000000
- Mar 28 00:20:31 localhost kernel: [ 4920.558553] ffff88011b62fc80 0000000000000046 0000000000000000 ffffea0001eafd40
- Mar 28 00:20:31 localhost kernel: [ 4920.558559] ffff88011b62fc10 80000000a4a2a025 ffff88010a0ee560 00000001003b892f
- Mar 28 00:20:31 localhost kernel: [ 4920.558564] ffff88011b60a6a0 ffff88011b62ffd8 000000000000f788 ffff88011b60a6a0
- Mar 28 00:20:31 localhost kernel: [ 4920.558569] Call Trace:
- Mar 28 00:20:31 localhost kernel: [ 4920.558574] [<ffffffff8111ec90>] ? sync_page+0x0/0x50
- Mar 28 00:20:31 localhost kernel: [ 4920.558578] [<ffffffff814cfdb3>] io_schedule+0xa3/0x110
- Mar 28 00:20:31 localhost kernel: [ 4920.558582] [<ffffffff8111eccd>] sync_page+0x3d/0x50
- Mar 28 00:20:31 localhost kernel: [ 4920.558586] [<ffffffff814d04ea>] __wait_on_bit_lock+0x5a/0xc0
- Mar 28 00:20:31 localhost kernel: [ 4920.558590] [<ffffffff8111ec67>] __lock_page+0x67/0x70
- Mar 28 00:20:31 localhost kernel: [ 4920.558594] [<ffffffff810932a0>] ? wake_bit_function+0x0/0x50
- Mar 28 00:20:31 localhost kernel: [ 4920.558599] [<ffffffff81177102>] cmp_and_merge_page+0x882/0x1a00
- Mar 28 00:20:31 localhost kernel: [ 4920.558604] [<ffffffff81175ca2>] ? get_next_rmap_item+0x1d2/0xdb0
- Mar 28 00:20:31 localhost kernel: [ 4920.558608] [<ffffffff81178ca6>] ksm_scan_thread+0xa26/0x1120
- Mar 28 00:20:31 localhost kernel: [ 4920.558612] [<ffffffff81178280>] ? ksm_scan_thread+0x0/0x1120
- Mar 28 00:20:31 localhost kernel: [ 4920.558616] [<ffffffff81092c86>] kthread+0x96/0xa0
- Mar 28 00:20:31 localhost kernel: [ 4920.558620] [<ffffffff8100c2ca>] child_rip+0xa/0x20
- Mar 28 00:20:31 localhost kernel: [ 4920.558624] [<ffffffff81092bf0>] ? kthread+0x0/0xa0
- Mar 28 00:20:31 localhost kernel: [ 4920.558628] [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
- Mar 28 00:20:31 localhost kernel: [ 4920.558648] INFO: task zeusmp_base.amd:5957 blocked for more than 120 seconds.
- Mar 28 00:20:31 localhost kernel: [ 4920.558773] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
- Mar 28 00:20:31 localhost kernel: [ 4920.558900] zeusmp_base.a D ffff88011a05b0c0 0 5957 5956 0 0x00000080
- Mar 28 00:20:31 localhost kernel: [ 4920.558905] ffff880084875cf0 0000000000000082 ffff880084875cc8 000000000f4775e0
- Mar 28 00:20:31 localhost kernel: [ 4920.558910] ffff8800a955d7b8 ffffea0000000000 ffff8800379d4301 ffff88011add9980
- Mar 28 00:20:31 localhost kernel: [ 4920.558915] ffff88011a05b6a0 ffff880084875fd8 000000000000f788 ffff88011a05b6a0
- Mar 28 00:20:31 localhost kernel: [ 4920.558920] Call Trace:
- Mar 28 00:20:31 localhost kernel: [ 4920.558924] [<ffffffff814d1a55>] rwsem_down_failed_common+0x95/0x1d0
- Mar 28 00:20:31 localhost kernel: [ 4920.558928] [<ffffffff814d1be6>] rwsem_down_read_failed+0x26/0x30
- Mar 28 00:20:31 localhost kernel: [ 4920.558933] [<ffffffff8126a174>] call_rwsem_down_read_failed+0x14/0x30
- Mar 28 00:20:31 localhost kernel: [ 4920.558937] [<ffffffff814d10e4>] ? down_read+0x24/0x30
- Mar 28 00:20:31 localhost kernel: [ 4920.558941] [<ffffffff810414a7>] __do_page_fault+0x187/0x480
- Mar 28 00:20:31 localhost kernel: [ 4920.558946] [<ffffffff810098ee>] ? __switch_to+0x26e/0x320
- Mar 28 00:20:31 localhost kernel: [ 4920.558950] [<ffffffff814cf502>] ? thread_return+0x4e/0x85c
- Mar 28 00:20:31 localhost kernel: [ 4920.558955] [<ffffffff814d4f6e>] do_page_fault+0x3e/0xa0
- Mar 28 00:20:31 localhost kernel: [ 4920.558959] [<ffffffff814d22e5>] page_fault+0x25/0x30
我们可以看到内核提示“task uksmd:81 blocked for more than 120 seconds.”,这告诉我们,uksmd这个进程停止在某处很久了。然后内核打印了这个进程的栈和Call Trace(也就是函数调用嵌套情况),我们重点关注的是Call Trace:。
每个Call Trace的格式是由两部分组成:[<压入栈的地址>] 所在函数名称
其中这个地址是当前函数调用的ret地址,或者换句话说,是这个call指令的下一条(这里看不懂的回去好好复习复习C调用规范和汇编 ),而函数名之前可能有 “?”这样一个符号,这个符号告诉我们,这个不是严格的调用框架上的节点,而是栈里面残存的(没有被后来调用的栈框架洗刷掉)一些调用轨迹。这个往往是说明到目前位置,曾经调用过的一些函数路径。
好,我们看一下着重关注的uksmd的情况
- Mar 28 00:12:31 localhost kernel: [ 4440.558086] [<ffffffff8111ec90>] ? sync_page+0x0/0x50
- Mar 28 00:12:31 localhost kernel: [ 4440.558092] [<ffffffff814cfdb3>] io_schedule+0xa3/0x110
- Mar 28 00:12:31 localhost kernel: [ 4440.558096] [<ffffffff8111eccd>] sync_page+0x3d/0x50
- Mar 28 00:12:31 localhost kernel: [ 4440.558100] [<ffffffff814d04ea>] __wait_on_bit_lock+0x5a/0xc0
- Mar 28 00:12:31 localhost kernel: [ 4440.558105] [<ffffffff8111ec67>] __lock_page+0x67/0x70
- Mar 28 00:12:31 localhost kernel: [ 4440.558110] [<ffffffff810932a0>] ? wake_bit_function+0x0/0x50
- Mar 28 00:12:31 localhost kernel: [ 4440.558116] [<ffffffff81177102>] cmp_and_merge_page+0x882/0x1a00
从上面的轨迹,我们不难看到,这次锁死是出现在一个page lock上,想象一下,这样的情形可能的情况:
- 有一个进程已经获得了这个page lock,而和uksmd互相dead lock在另一个锁上
- 有进程忘记释放page lock了,很有可能是错误的进程自己,这里罪魁祸首最大可能性就是uksmd自己
为了进一步缩小范围,我们接下来分析日志其它部分:
我们发现同时锁死的有khugepaged, 和 zeusmp_base.amd,后面这个进程是我们benchmark进程,在用户层,所以一般是无辜的。而khugepaged同样是一个页表操纵进程(将碎页合并成大页面),所以首先怀疑是它们两者的dead lock,我们接着看khugepaged的call trace:
- Mar 28 00:18:31 localhost kernel: [ 4800.558229] [<ffffffff814d1bb3>] rwsem_down_write_failed+0x23/0x30
- Mar 28 00:18:31 localhost kernel: [ 4800.558234] [<ffffffff8126a1a3>] call_rwsem_down_write_failed+0x13/0x20
- Mar 28 00:18:31 localhost kernel: [ 4800.558239] [<ffffffff814d10b2>] ? down_write+0x32/0x40
很明显,锁在一个信号量(semaphore)上,那么这个具体是哪个呢?其实也很明显:mmap_sem是最大的可能性。因为在合并大页面之前,mmap_sem是要加锁,这个很容易想象(其实具体在collapse_huge_page函数里面),而uksmd显然在合并页面的时候也加了这个锁,所以,不难看出khugepaged其实锁在 uksmd正在工作的mmap_sem上。这个根本原因还是uksmd死锁导致的,而不是它们之间的互相 dead lock。
接下来看zeusmp_base.amd被锁死的场景,也是同一个:mmap_sem,所以不难的出这个现场的基本猜测:
uksmd自己有某个页面没有释放锁,而下一次扫描的时候,堵在了自己加的page lock上,而由于它被锁死,没有释放mmap_sem,所以khugepaged在扫描zeusmp_base.amd内存区域的时候被锁死了,而接着zeusmp_base.amd由于page fault同样要mmap_sem加读锁,被堵在khugepaged的写锁操作上。
现场分析到此结束,看明白了嘛?
http://kerneldedup.org/forum/forum.php?mod=viewthread&tid=17&extra=page%3D1