sync_supers() NULL pointer导致的内核崩溃

这BUG随机出现,可能4小时就出现,也可能40多小时才出现。。真是恐怖呀,生生卡了我2周。

内核空指针崩溃 现场如下:

[107943.320000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[107943.320000] pgd = c0004000
[107943.330000] [00000000] *pgd=00000000
[107943.330000] Internal error: Oops: 17 [#1] PREEMPT
[107943.330000] Modules linked in: ieee1588 veex_specnet pppoe pppox ppp_async ppp_generic slhc 8021q ipv6 g_ether
[107943.330000] CPU: 0    Not tainted  (2.6.22.6-UX400-MX27 #74)
[107943.330000] PC is at sync_supers+0x48/0x114
[107943.330000] LR is at sync_supers+0x14/0x114
[107943.330000] pc : [<c008ea74>]    lr : [<c008ea40>]    psr: 60000013
[107943.330000] sp : c7fb1f08  ip : c7fb1f08  fp : c7fb1f24
[107943.330000] r10: 00000000  r9 : c034d70c  r8 : 00000001
[107943.330000] r7 : c7fb0000  r6 : c7fb1f98  r5 : c032d804  r4 : 00000000
[107943.330000] r3 : 00000000  r2 : 00000000  r1 : 00000001  r0 : 00000001
[107943.330000] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  Segment kernel
[107943.330000] Control: 0005317f  Table: a6e48000  DAC: 00000017
[107943.330000] Process pdflush (pid: 71, stack limit = 0xc7fb0260)
[107943.330000] Stack: (0xc7fb1f08 to 0xc7fb2000)
[107943.330000] 1f00:                   c7fb0000 c7fb1f28 c7fb1f98 c0328c08 c7fb1f84 c7fb1f28 
[107943.330000] 1f20: c006deb4 c008ea3c 00000000 00000000 c7fb1f5c 00000000 00000000 00000000 
[107943.330000] 1f40: 00000000 00000000 00000000 00000000 00000025 00000000 c7fb1f98 c7fb0000 
[107943.330000] 1f60: c7fb0000 c032d490 c7fb1f98 c0328c08 00000001 00000000 c7fb1fd4 c7fb1f88 
[107943.330000] 1f80: c006e754 c006de44 c027fdf0 c7cc4ac0 c006de34 00000000 c7fb1f98 c7fb1f98 
[107943.330000] 1fa0: 00a43e38 00000001 c7fb1fd4 c7fb0000 c034c508 c006e5e4 00000000 00000000 
[107943.330000] 1fc0: 00000000 00000000 c7fb1ff4 c7fb1fd8 c0050e80 c006e5f4 00000000 00000000 
[107943.330000] 1fe0: 00000000 00000000 00000000 c7fb1ff8 c003df2c c0050e38 e797dfff ffffffff 
[107943.330000] Backtrace: 
[107943.330000] [<c008ea2c>] (sync_supers+0x0/0x114) from [<c006deb4>] (wb_kupdate+0x80/0x170)
[107943.330000]  r7:c0328c08 r6:c7fb1f98 r5:c7fb1f28 r4:c7fb0000
[107943.330000] [<c006de34>] (wb_kupdate+0x0/0x170) from [<c006e754>] (pdflush+0x170/0x278)
[107943.330000] [<c006e5e4>] (pdflush+0x0/0x278) from [<c0050e80>] (kthread+0x58/0x90)
[107943.330000] [<c0050e28>] (kthread+0x0/0x90) from [<c003df2c>] (do_exit+0x0/0x9d0)
[107943.330000]  r7:00000000 r6:00000000 r5:00000000 r4:00000000
[107943.330000] Code: e5d42011 e3520000 1a00000b e1a04003 (e5933000) 
[107943.330000] Sleep one second before cut off the power for SD card protection!
[107943.330000] BUG: scheduling while atomic: pdflush/0x00000002/71
[107943.330000] [<c002924c>] (dump_stack+0x0/0x14) from [<c0280058>] (schedule+0x5e4/0x688)
[107943.330000] [<c027fa74>] (schedule+0x0/0x688) from [<c0280de4>] (schedule_timeout+0x74/0xd8)
[107943.330000] [<c0280d70>] (schedule_timeout+0x0/0xd8) from [<c0280e9c>] (schedule_timeout_interruptible+0x28/0x2c)
[107943.330000]  r6:c7cc4ac0 r5:c7fb1f24 r4:c7fb0000fenbutu f
[107943.330000] [<c0280e74>] (schedule_timeout_interruptible+0x0/0x2c) from [<c004502c>] (msleep_interruptible+0x3c/0x4c)
[107943.330000] [<c0044ff0>] (msleep_interruptible+0x0/0x4c) from [<c0028d2c>] (die+0x158/0x1dc)
[107943.330000]  r5:c7fb1f24 r4:c7fb1ec0
[107943.330000] [<c0028bd4>] (die+0x0/0x1dc) from [<c002a3d4>] (__do_kernel_fault+0x6c/0x7c)
[107943.330000] [<c002a368>] (__do_kernel_fault+0x0/0x7c) from [<c002a4cc>] (do_page_fault+0xe8/0x260)
[107943.330000]  r7:00000000 r6:c7fb1ec0 r5:c7cc4ac0 r4:00000000
[107943.330000] [<c002a3e4>] (do_page_fault+0x0/0x260) from [<c002427c>] (do_DataAbort+0x38/0x9c)
[107943.330000] [<c0024244>] (do_DataAbort+0x0/0x9c) from [<c0024aac>] (__dabt_svc+0x4c/0x60)
[107943.330000] Exception stack(0xc7fb1ec0 to 0xc7fb1f08)
[107943.330000] 1ec0: 00000001 00000001 00000000 00000000 00000000 c032d804 c7fb1f98 c7fb0000 
[107943.330000] 1ee0: 00000001 c034d70c 00000000 c7fb1f24 c7fb1f08 c7fb1f08 c008ea40 c008ea74 
[107943.330000] 1f00: 60000013 ffffffff                                                       
[107943.330000]  r7:c7fb0000 r6:c7fb1f98 r5:c7fb1ef4 r4:ffffffff
[107943.330000] [<c008ea2c>] (sync_supers+0x0/0x114) from [<c006deb4>] (wb_kupdate+0x80/0x170)
[107943.330000]  r7:c0328c08 r6:c7fb1f98 r5:c7fb1f28 r4:c7fb0000
[107943.330000] [<c006de34>] (wb_kupdate+0x0/0x170) from [<c006e754>] (pdflush+0x170/0x278)
[107943.330000] [<c006e5e4>] (pdflush+0x0/0x278) from [<c0050e80>] (kthread+0x58/0x90)
[107943.330000] [<c0050e28>] (kthread+0x0/0x90) from [<c003df2c>] (do_exit+0x0/0x9d0)
[107943.330000]  r7:00000000 r6:00000000 r5:00000000 r4:00000000
[107947.210000] Rx buffer full!
[107949.210000] Rx buffer full!
[107951.210000] Rx buffer full!
[107953.200000] Rx buffer full!
[107955.200000] Rx buffer full!
[107957.200000] Rx buffer full!
[107959.200000] Rx buffer full!
[107961.200000] Rx buffer full!
[107963.200000] Rx buffer full!
[107965.200000] Rx buffer full!
[107967.200000] Rx buffer full!
[107969.200000] Rx buffer full!
[107971.200000] Rx buffer full!
[107973.190000] Rx buffer full!
[107975.190000] Rx buffer full!
开始一直以为是ieee1588模块的问题,结果优化了一周ieee1588模块,也没解决问题,后来发现应该是没找对地方,然后再跟sync_supers() 函数,终于发现了问题所在。

原来是linux版本低了,内核有bug。

linux-2.6.22版本的 sync_supers() 函数定义如下:

void sync_supers(void)
{
        struct super_block *sb;

        spin_lock(&sb_lock);
restart:
        list_for_each_entry(sb, &super_blocks, s_list) {
                if (sb->s_dirt) {
                        sb->s_count++;
                        spin_unlock(&sb_lock);
                        down_read(&sb->s_umount);
                        write_super(sb);
                        up_read(&sb->s_umount);
                        spin_lock(&sb_lock);
                        if (__put_super_and_need_restart(sb))
                                goto restart;
                }
        }
        spin_unlock(&sb_lock);
}
上面的write_super(sb)函数,并没有加空指针判断,结果就出来开头的空指针导致的内核崩溃。

而linux 2.6.35版本就加了一些空指针的判断, 如下:

void sync_supers(void)
{
        struct super_block *sb, *n;

        spin_lock(&sb_lock);
        list_for_each_entry_safe(sb, n, &super_blocks, s_list) {
                if (list_empty(&sb->s_instances))
                        continue;
                if (sb->s_op->write_super && sb->s_dirt) {
                        sb->s_count++;
                        spin_unlock(&sb_lock);

                        down_read(&sb->s_umount);
                        if (sb->s_root && sb->s_dirt)
                                sb->s_op->write_super(sb);
                        up_read(&sb->s_umount);

                        spin_lock(&sb_lock);
                        /* lock was dropped, must reset next */
                        list_safe_reset_next(sb, n, s_list);
                        __put_super(sb);
                }
        }
        spin_unlock(&sb_lock);
}
以后碰到内核空指针问题,一般都应该从下面这个地方下手,一般都是PC 和LR 中的这几个函数造成的
[107943.330000] PC is at sync_supers+0x48/0x114
[107943.330000] LR is at sync_supers+0x14/0x114
而这个函数的调用关系如下:

[107943.330000] [<c008ea2c>] (sync_supers+0x0/0x114) from [<c006deb4>] (wb_kupdate+0x80/0x170)
[107943.330000]  r7:c0328c08 r6:c7fb1f98 r5:c7fb1f28 r4:c7fb0000
[107943.330000] [<c006de34>] (wb_kupdate+0x0/0x170) from [<c006e754>] (pdflush+0x170/0x278)
[107943.330000] [<c006e5e4>] (pdflush+0x0/0x278) from [<c0050e80>] (kthread+0x58/0x90)
[107943.330000] [<c0050e28>] (kthread+0x0/0x90) from [<c003df2c>] (do_exit+0x0/0x9d0)



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值