linux kernel crash问题分析解决

一,问题场景和环境
系统环境:
redhat6.4 kernel:2.6.32-358


问题:
使用iptables给mangle表添加了一条规则,使用nfqueue做为target。当一个http请求命中这个规则之后,机器直接重启了。偶发性的出了两次问题,但是却在重启的机器上重现不了这个问题。


二,排查
1,查看messages,kernel和dmesg相关日志,未发现有任何异常
2,查看重启前机器的负载,cpu,内存,磁盘io,网络io都正常
3,由于是使用了nfqueue做为target才导致的重启,怀疑是系统的问题,通过现象看应该是iptables的nfqueue导致的问题,而nfqueue用于从内核读取数据包在用户态处理。故具体定位在kernel或者libnetfilter_queue上。
4,通过服务器显示屏幕来看重启的时候会有什么有用的输出,但是服务器在客户的机房,查看太麻烦
5,使用last查看服务器的重启记录,发现一个意外现象,即:机器因为nfqueue重启的那个记录里面有一个crash记录,意思即系统奔溃了,从而导致重启。那就能断定是系统或者kernel crash了。
6,linux系统一般默认都安装配置了kdump,故当 linux 系统内核发生崩溃的时候,可以通过 kdump 等方式收集内核崩溃之前的内存,在/var/crash/日期 目录生成一个转储文件 vmcore。使用crash工具可以分享vmcore文件,来获取kernel crash前的一些重要信息。通过在机器上查找,果然发现了crash相关的vmcore文件。


三,分析vmcore文件
1,安装指定kernel的debuginfo包:
# yum install kernel-debuginfo-2.6.32-358.el6.x86_64


2,使用系统自带的crash命令分析vmcore:

# crash /usr/lib/debug/lib/modules/2.6.32-358.el6.x86_64/vmlinux vmcore
crash 7.1.0-6.el6
Copyright (C) 2002-2014  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
WARNING: kernel version inconsistency between vmlinux and dumpfile
      KERNEL: vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 40
        DATE: Tue Oct 31 11:53:41 2017
      UPTIME: 342 days, 12:15:26
LOAD AVERAGE: 0.00, 0.02, 0.00
       TASKS: 1050
    NODENAME: web_yp_49_202.mobileztgame
     RELEASE: 2.6.32-358.el6.x86_64
     VERSION: #1 SMP Tue Jan 29 11:47:41 EST 2013
     MACHINE: x86_64  (2499 Mhz)
      MEMORY: 128 GB
       PANIC: "BUG: unable to handle kernel NULL pointer dereference at (null)"
         PID: 0
     COMMAND: "swapper"
        TASK: ffff882069324080  (1 of 40)  [THREAD_INFO: ffff881068896000]
         CPU: 5
       STATE: TASK_RUNNING (PANIC)


从crash的输出可以看到kernel崩溃的原因为kernel遇见空指针导致崩溃




bt 命令用于查看系统崩溃前的堆栈等信息
bt命令结果如下:

crash> bt
PID: 0      TASK: ffff882069324080  CPU: 5   COMMAND: "swapper"
 #0 [ffff8800618a3750] machine_kexec at ffffffff81035b7b
 #1 [ffff8800618a37b0] crash_kexec at ffffffff810c0db2
 #2 [ffff8800618a3880] oops_end at ffffffff815111d0
 #3 [ffff8800618a38b0] no_context at ffffffff81046bfb
 #4 [ffff8800618a3900] __bad_area_nosemaphore at ffffffff81046e85
 #5 [ffff8800618a3950] bad_area_nosemaphore at ffffffff81046f53
 #6 [ffff8800618a3960] __do_page_fault at ffffffff810476b1
 #7 [ffff8800618a3a80] do_page_fault at ffffffff8151311e
 #8 [ffff8800618a3ab0] page_fault at ffffffff815104d5
    [exception RIP: nf_queue+152]
    RIP: ffffffff81475718  RSP: ffff8800618a3b60  RFLAGS: 00010207
    RAX: 0000000000000020  RBX: 0000000000000000  RCX: ffff8810638a3c00
    RDX: 0000000000000002  RSI: ffff880959189980  RDI: 0000000000000000
    RBP: ffff8800618a3bd0   R8: 0000000000021773   R9: 0000000000000001
    R10: 000000000000000e  R11: 0000000000000006  R12: ffff880959189980
    R13: 0000000000000000  R14: ffffffff8147e8b0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff8800618a3bd8] nf_hook_slow at ffffffff81474800
#10 [ffff8800618a3c58] ip_rcv at ffffffff8147ef54
#11 [ffff8800618a3c98] __netif_receive_skb at ffffffff8144819b
#12 [ffff8800618a3cf8] netif_receive_skb at ffffffff8144a578
#13 [ffff8800618a3d38] napi_skb_finish at ffffffff8144a680
#14 [ffff8800618a3d58] napi_gro_receive at ffffffff8144cc29
#15 [ffff8800618a3d78] ixgbe_poll at ffffffffa015e44c [ixgbe]
#16 [ffff8800618a3e68] net_rx_action at ffffffff8144cd43
#17 [ffff8800618a3ec8] __do_softirq at ffffffff81076fb1
#18 [ffff8800618a3f38] call_softirq at ffffffff8100c1cc
#19 [ffff8800618a3f50] do_softirq at ffffffff8100de05
#20 [ffff8800618a3f70] irq_exit at ffffffff81076d95
#21 [ffff8800618a3f80] do_IRQ at ffffffff81516c95
--- <IRQ stack> ---
#22 [ffff881068897db8] ret_from_intr at ffffffff8100b9d3
    [exception RIP: intel_idle+222]
    RIP: ffffffff812d37ae  RSP: ffff881068897e68  RFLAGS: 00000206
    RAX: 0000000000000000  RBX: ffff881068897ed8  RCX: 0000000000000000
    RDX: 00000000000e3cb1  RSI: 0000000000000000  RDI: 00000000379d13ba
    RBP: ffffffff8100b9ce   R8: 0000000000000004   R9: 0000000000000050
    R10: 0069229e5ea9dbfa  R11: 0000000000000000  R12: ffff8800618b15a0
    R13: 0000000000000000  R14: 0069229c2b297a40  R15: ffff8800618b16a0
    ORIG_RAX: ffffffffffffff62  CS: 0010  SS: 0018
#23 [ffff881068897ee0] cpuidle_idle_call at ffffffff81414ef7
#24 [ffff881068897f00] cpu_idle at ffffffff81009fc6




通过bt分析,我们从下到上来看kernel崩溃前的系统调用,定位到kernel崩溃前的一个exception是ip寄存器RIP的异常,而通过dis 命令来看一下该地址的反汇编结果:

crash> dis -l ffffffff81475718
/usr/src/debug/kernel-2.6.32-358.el6/linux-2.6.32-358.el6.x86_64/net/netfilter/nf_queue.c: 221
0xffffffff81475718 <nf_queue+152>:      mov    (%rbx),%r12


故可定位到出现异常的代码段:

# vim /usr/src/debug/kernel-2.6.32-358.el6/linux-2.6.32-358.el6.x86_64/net/netfilter/nf_queue.c +221
215         segs = skb_gso_segment(skb, 0);
216         kfree_skb(skb);
217         if (IS_ERR(segs))
218                 return 1;
219
220         do {
221                 struct sk_buff *nskb = segs->next;
222
223                 segs->next = NULL;
224                 if (!__nf_queue(segs, elem, pf, hook, indev, outdev, okfn,
225                                 queuenum))
226                         kfree_skb(segs);
227                 segs = nskb;
228         } while (segs);
229         return 1;






而通过看skb_gso_segment结构体,可以判断出是因为skb_gso_segment在某些情况下会返回NULL,从而导致如上代码segs->next获取到了空指针,从而导致kernel崩溃。而既然是gso导致的问题,应该可以通过调整系统gso属性来规避这个问题:

# vim /usr/src/debug/kernel-2.6.32-358.el6/linux-2.6.32-358.el6.x86_64/net/core/dev.c +1728
1728 /**
1729  *      skb_gso_segment - Perform segmentation on skb.
1730  *      @skb: buffer to segment
1731  *      @features: features for the output path (see dev->features)
1732  *
1733  *      This function segments the given skb and returns a list of segments.
1734  *
1735  *      It may return NULL if the skb requires no segmentation.  This is
1736  *      only possible when GSO is used for verifying header integrity.
1737  */
1738 struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features)
1739 {
1740         struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT);
1741         struct packet_type *ptype;
1742         __be16 type = skb->protocol;
1743         int err;


从网上找到的对应patch如下:
https://patchwork.kernel.org/patch/6615071/


四,问题重现
1,最早发现问题,想要重现的办法是通过如下url访问:curl “t.test.com”,发现重现不了。
2,之后,通过搜索相关TSO/GSO/LRO/GRO相关的资料,觉得有可能是由于发送的数据包太小,导致没有触发相关的数据包分段重组,从而没有导致重现问题。故增大了请求的数据包,通过如下url重现了问题:
# curl “t.test.com/v2/user-manage/css/bootstrap.min.css?test1=sdfsfsdfsdfa&test2_id=2234234234234234234&test_id=50129009890098&test_token=1670056402|_80_m_lxxj1298|1493196793|c726299f2d03b8462764bacf20e2395f|sdfsdfdsfsdffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffsdfsdfsdfdsfsdfhgjgjghjghjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjfhjgjfghjfjfhjjjjjjjjjjjjjjjjjjjjjfffffadfsfsdfsdfsdfsdfsdfdsfdssdfsdfsdfsdfsdfsdf”
iptables相关规则如下:



# ipset create lee hash:ip hashsize 819200 maxelem 100000 timeout 300
# ipset add lee 1.1.1.1 timeout 300
# iptables -t mangle -I PREROUTING -p tcp -m multiport --dports 80,443 -m set --match-set lee src -m string --string t.test.com --algo kmp --from 0 --to 1480 -j NFQUEUE


五,问题结论
linux kernel bug


六,解决办法
1,升级kernel。从patch和源代码可以看出kernel 3.0以后应该fix了这个问题,看了下3.10的kernel代码已经fix
2,使用drop,不再使用nfqueue这个target来添加iptables规则(建议使用这个办法)
3,调整网卡gso相关属性,发现通过关闭lro来解决这个重启问题。具体命令:

# ethtool -K eth0 lro on


LRO简介:
Linux 在 2.6.24 中加入了支持 IPv4 TCP 协议的 LRO (Large Receive Offload) ,它通过将多个 TCP 数据聚合在一个 skb 结构,在稍后的某个时刻作为一个大数据包交付给上层的网络协议栈,以减少上层协议栈处理 skb 的开销,提高系统接收 TCP 数据包的能力。当然,这一切都需要网卡驱动程序支持。
阅读更多
换一批

linux kernel crash

05-05

CPU mips R4000rn不定时crash(看起来跟sip ALG功能有关系)rn将vmlinux dump成了汇编后,查到call trace里面的rn[<9403d334>] [<94041948>] [<9403d484>] 这几个是在do_softirq时执行tasklet action的代码中rntasklet_action timer_softirq tasklet_hi_actionrnrn1)会有什么原因导致这类型的crash?内存管理方面(或指针错误)、调度方面还是有其他原因?rn2)epc : c004c138 Not tainted 出错的这句指令产生什么动作?rnrn谢谢!rnrncrash logrnrnOops in fault.c:do_page_fault, line 204:rn$0 : 00000000 1000fc01 24a5d9c4 c0132214 94702c00 947005cc 000002e6 00000000rn$8 : 000292ba 00000000 94248040 00000003 746e635f 00000003 00000000 94be3f18rn$16: 94271084 94251660 fffffffe ffffffff 1000fc00 942508a0 942480c0 c004c118rn$24: 00000000 00440a50 94026000 94027e00 02000000 94041874rnHi : 000292barnLo : 8f5e6959rnepc : c004c138 Not taintedrnStatus: 1000fc03rnCause : 00800008rnProcess swapper (pid: 0, stackpage=94026000)rnStack: c0078dd0 94248080 942480c0 9403d334 94251628 ffffffff 94041948 9403d484rn 00000001 942480e0 ffffffee ffffffef 00000001 942480c0 9403ccec fffffffdrn 94026000 00000001 00000000 940413dc 7ffffffe 9424ffb4 942508a4 94027eb0rn 10005ee0 7fff50b8 7fff50ed 9403cfe8 10005ee0 7fff50b8 7fff50ed 9403cfe8rn 941d3230 942669d4 00000000 942a0000 941c5134 942a0000 00808000 942669d4rn 941d397c ...rnCall Trace: [] [<9403d334>] [<94041948>] [<9403d484>] [<9403ccec>] [<94026000>]rn [<940413dc>] [<94027eb0>] [<9403cfe8>] [<9403cfe8>] [<941d3230>] [<941c5134>]rn [<941d397c>] [<941d2dc0>] [<941d2dd8>] [<94027fe0>] [<94027fe0>] [<941d397c>]rn [<94026000>] [<94027f60>] [<94029e18>] [<94029dfc>] [<9402802c>] [<941d39cc>]rn [<94027f7c>] [<941d53ec>] [<940204b0>]rn

linux kernel 重启问题分析

09-06

本人使用I2c iotcl做了60字节的i2c dma write,结果出现了死机重启现象rn如下是我的log分析rn[ 506.779972] (2)[3980:cxdish][cxdish1]ioctl txbuf[63] = 0x28rn<6>[ 506.781793] (2)[3980:cxdish][cxdish1]ioctl dma write succesrn<6>[ 506.781816] (2)[3980:cxdish][cxdish1]i = 1rn<6>[ 506.781829] (2)[3980:cxdish][cxdish1]kfree rdwr_pa[d%].bufrn<6>[ 506.781851] (2)[3980:cxdish][cxdish1]kfree data_ptrsrn<6>[ 506.781870] (2)[3980:cxdish][cxdish1]kfree rdwr_parn<6>[ 506.781889] (2)[3980:cxdish][cxdish1]ioctl rdrw ok!!rn<4>[ 506.781910] -(2)[3980:cxdish][KERN Warning] Some Kernel ERROR or WARN occur and Force debug_lock off!rn<4>[ 506.781930] -(2)[3980:cxdish][KERN Warning] check below backtrace first:rn<4>[ 506.781946] -(2)[3980:cxdish]Backtrace: rn<4>[ 506.781995] -(2)[3980:cxdish][] (dump_backtrace+0x0/0x114) from [] (dump_stack+0x20/0x24)rn<4>[ 506.782016] -(2)[3980:cxdish] r6:c09b1100 r5:c0b82048 r4:c781e000 r3:271ae95frn<4>[ 506.782082] -(2)[3980:cxdish][] (dump_stack+0x0/0x24) from [] (debug_locks_off+0x50/0x78)rn<4>[ 506.782118] -(2)[3980:cxdish][] (debug_locks_off+0x0/0x78) from [] (oops_enter+0x1c/0x38)rn<4>[ 506.782153] -(2)[3980:cxdish][] (oops_enter+0x0/0x38) from [] (die+0x40/0x2f0)rn<4>[ 506.782189] -(2)[3980:cxdish][] (die+0x0/0x2f0) from [] (arm_notify_die+0x28/0x60)rn<4>[ 506.782223] -(2)[3980:cxdish][] (arm_notify_die+0x0/0x60) from [] (do_undefinstr+0xc4/0x1d4)rn<4>[ 506.782257] -(2)[3980:cxdish][] (do_undefinstr+0x0/0x1d4) from [] (__und_svc_finish+0x0/0x2c)rn<4>[ 506.782279] -(2)[3980:cxdish]Exception stack(0xc781fd70 to 0xc781fdb8)rn<4>[ 506.782301] -(2)[3980:cxdish]fd60: 00000000 00000002 28006080 271ae95frn<4>[ 506.782327] -(2)[3980:cxdish]fd80: ccc06400 c0b82048 00000000 ffffffff ccc063e0 ccc06b00 00000040 c781fe04rn<4>[ 506.782351] -(2)[3980:cxdish]fda0: c781fe08 c781fdf8 c05662f0 c0052754 200f0013 ffffffffrn<4>[ 506.782385] -(2)[3980:cxdish][] (__stack_chk_fail+0x0/0x18) from [] (i2cdev_ioctl_rdrw+0xfc/0x498)rn<4>[ 506.782420] -(2)[3980:cxdish][] (i2cdev_ioctl_rdrw+0x0/0x498) from [] (cxdish_i2cdev_ioctl+0x1b8/0x21c)rn<4>[ 506.782459] -(2)[3980:cxdish][] (cxdish_i2cdev_ioctl+0x0/0x21c) from [] (do_vfs_ioctl+0x94/0x5bc)rn<4>[ 506.782479] -(2)[3980:cxdish] r6:00000003 r5:ca3bf000 r4:00000000rn<4>[ 506.782534] -(2)[3980:cxdish][] ([color=#FF0000]do_vfs_ioctl[/color]+0x0/0x5bc) from [] (sys_ioctl+0x7c/0x8c)rn<4>[ 506.782569] -(2)[3980:cxdish][] (sys_ioctl+0x0/0x8c) from [] (ret_fast_syscall+0x0/0x40)rn<4>[ 506.782589] -(2)[3980:cxdish] r8:c000e704 r7:00000036 r6:b6f20118 r5:b6f20178 r4:00000000rn<4>[ 506.782650] -(2)[3980:cxdish]------------[ cut here ]------------rn<2>[ 506.782669] -(2)[3980:cxdish]kernel BUG at /home/wuyongjun/workdir/MT8127/kernel/kernel/panic.c:481!rn<0>[ 506.782689] -(2)[3980:cxdish]Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARMrn<4>[ 506.782708] -(2)[3980:cxdish]Send IPI to stop CPUs...rn<2>[ 506.782732] -(1)[1258:FinalizerDaemon]rn<2>[ 506.782739] -(1)[1258:FinalizerDaemon] CPU1: stopping and cpu_relax,state:1rn个人感觉好像是这个do_vfs_ioctl出错了,如果是这样的话,一般什么情况会导致这种情况?rn

没有更多推荐了,返回首页