如果内核的panic发生在动态加载的模块代码,如果模块在编译时变量INSTALL_MOD_STRIP = --strip-unneeded, 则panic后打印的函数调用链有可能没有函数符号解析出来,也可能解析出来的符号是错误的(通过cscope工具发现A函数根本没调用B函数,但call trace却显示A调用了B, 这是由于可能实际上是C函数调用了B,但C的符号在相应的.ko文件已经被去掉了,而且C函数的代码正好位于A函数的代码结尾,使得内核认为本属于C函数的代码也属于A), 这种异常的函数调用链可以通过编译一个没有strip的.ko文件和strip过的.ko文件进行反汇编后的一个对比来fix. 示例如下:
<0>[ 8414.123135] EIP: [<c12ef56e>] vfree+0x6e/0x70 SS:ESP 0068:f580decc
<4>[ 8414.123166] ---[ end trace efa0f272e653f075 ]---
<0>[ 8414.123176] Kernel panic - not syncing: Fatal exception in interrupt
<4>[ 8414.123190] Pid: 1167, comm: droid.gallery3d Tainted: G D C 3.0.34-140447-gae65631 #1
<4>[ 8414.123201] Call Trace:
<4>[ 8414.123217] [<c185c6ad>] ? printk+0x1d/0x1f
<4>[ 8414.123232] [<c185c594>] panic+0x66/0x162
<4>[ 8414.123249] [<c1867845>] oops_end+0xc5/0xd0
<4>[ 8414.123265] [<c1205414>] die+0x54/0x80
<4>[ 8414.123281] [<c1866fe6>] do_trap+0x96/0xd0
<4>[ 8414.123295] [<c1202d10>] ? do_bounds+0x80/0x80
<4>[ 8414.123309] [<c1202d9b>] do_invalid_op+0x8b/0xa0
<4>[ 8414.123326] [<c12ef56e>] ? vfree+0x6e/0x70
<4>[ 8414.123358] [<f8712de2>] ? hmm_store+0xe2/0x1a0 [atomisp]
<4>[ 8414.123388] [<f8715061>] ? hrt_isp_css_mm_store+0x11/0x30 [atomisp]
<4>[ 8414.123417] [<f870fed7>] ? sh_css_params_write_to_ddr+0x1437/0x1a00 [atomisp]
<4>[ 8414.123436] [<c12bce5e>] ? cpupri_set+0xbe/0x100
<4>[ 8414.123454] [<c1202d10>] ? do_bounds+0x80/0x80
<4>[ 8414.123473] [<c1496608>] ? trace_hardirqs_off_thunk+0xc/0x14
<4>[ 8414.123489] [<c1866da3>] error_code+0x5f/0x64
<4>[ 8414.123506] [<c1202d10>] ? do_bounds+0x80/0x80
<4>[ 8414.123521] [<c12ef56e>] ? vfree+0x6e/0x70
<4>[ 8414.123537] [<c18664c6>] ? _raw_spin_unlock_irqrestore+0x26/0x50
<4>[ 8414.123565] [<f87185f9>] atomisp_kernel_free+0x39/0x50 [atomisp]
<4>[ 8414.123589] [<f8703298>] sh_css_free+0x18/0x20 [atomisp]
<4>[ 8414.123617] [<f870d9ef>] sh_css_shading_table_free+0x1f/0x40 [atomisp]
<4>[ 8414.123641] [<f87012c1>] 0xf87012c0
<4>[ 8414.123665] [<f8704cba>] sh_css_video_start+0x3a/0x650 [atomisp]
<4>[ 8414.123694] [<f8718243>] atomisp_reqbufs+0x12e3/0x1600 [atomisp] <==函数太大了。。。
<4>[ 8414.123721] [<f8718ad4>] atomisp_isr+0x2e4/0x370 [atomisp]
Note below log: <4>[ 8414.123694] [<f8718243>] atomisp_reqbufs+0x12e3/0x1600 [atomisp] <4>[ 8414.123665] [<f8704cba>] sh_css_video_start+0x3a/0x650 [atomisp] the size of atomisp_reqbufs is 0x1600(5632 bytes). the size of sh_css_video_start is 0x650(1616 bytes) the below log is read from readelf -a /lib/atomisp.ko. 485: 00015f60 500 FUNC GLOBAL DEFAULT 1 atomisp_reqbufs 155: 00003c80 732 FUNC GLOBAL DEFAULT 1 sh_css_video_start
在通过csocpe 工具发现atomisr函数根本没有调用atomisp_reqbufs, 通过objdump -Dx atomisp.ko来对比strip和un-strip版本,发现。。。compare the size, it's diff over 10 times. that's abnormal.
17acf: e8 dc f6 ff ff call 171b0 <atomisp_reqbufs+0x1250> 17acf: e8 dc f6 ff ff call 171b0 <atomisp_start_binary>
code是一样的,但是符号却是不同的,肯定有内情。。。然后在对比发现atomisp_reqbufs包含多条如下指令
push %ebp
mov %esp,%ebp
这两条指令应该每个函数开始都包含的用来保存调用函数的栈指针。因此atomips_reqbufs符号把本属于其它函数的代码也包含进来了。。。
PS: 调用BUG_ON(1) 会产生一个无效指令码的ipanic。。 。 幸亏大佬指点,还以为内核的代码段被修改了,
<4>[ 27.947927] [<c185c63d>] ? printk+0x1d/0x1f
<4>[ 27.947936] [<c185c524>] panic+0x66/0x162
<4>[ 27.947946] [<c18677c9>] oops_end+0xb9/0xd0
<4>[ 27.947957] [<c1205414>] die+0x54/0x80
<4>[ 27.947966] [<c1866f76>] do_trap+0x96/0xd0
<4>[ 27.947974] [<c1202d10>] ? do_bounds+0x80/0x80
<4>[ 27.947983] [<c1202d9b>] do_invalid_op+0x8b/0xa0