使用mdb调试Solaris内核crash文件

转自:http://blog.csdn.net/hotsolaris/archive/2008/03/10/2163306.aspx

Working with Kernel Cores

The most common type of kernel debug target is a core file, saved from a prior system crash. In the following sections, we highlight some of the introductory steps as used with mdb to explore a kernel core image.

14.1.1. Locating and Attaching the Target

If a system has crashed, then we should have a core image saved in /var/crash on the target machine. The mdb debugger should be invoked from a system with the same architecture and Solaris revision as the crash image. The first steps are to locate the appropriate saved image and then to invoke mdb.

#  cd /var/crash/nodename

# ls
bounds unix.1 unix.3 unix.5 unix.7 vmcore.1 vmcore.3 vmcore.5 vmcore.7
unix.0 unix.2 unix.4 unix.6 vmcore.0 vmcore.2 vmcore.4 vmcore.6

# mdb -k unix.7 vmcore.7
Loading modules: [ unix krtld$c
genunix specfs dtrace ufs ip sctp usba uhci s1394 fcp fctl nca lofs zfs random nfs
audiosup sppp crypto md fcip logindmux ptm ipc ]
>

14.1.2. Examining Kernel Core Summary Information

The kernel core contains important summary information from which we can extract the following:

  • Revision of the kernel

  • Hostname

  • CPU and platform architecture of the system

  • Panic string

  • Module causing the panic

We can use the ::showrev and ::status dcmds to extract this information.

> ::showrev
Hostname: zones-internal
Release: 5.11
Kernel architecture: i86pc
Application architecture: i386
Kernel version: SunOS 5.11 i86pc snv_27
Platform: i86pc
> ::status
debugging crash dump vmcore.2 (32-bit) from zones-internal
operating system: 5.11 snv_27 (i86pc)
panic message: BAD TRAP: type=e (#pf Page fault) rp=d2a587c8 addr=0 occurred in module
"unix" due to a NULL pointer dereference
dump content: kernel pages only
> ::panicinfo
cpu 0
thread d2a58de0
message BAD TRAP: type=e (#pf Page fault) rp=d2a587c8 addr=0 occurred in module
"unix" due to a NULL pointer dereference
gs fe8301b0
fs fec30000
es fe8d0160
ds d9820160
edi 0
esi dc062298
ebp d2a58828
esp d2a58800
ebx de453000
edx d2a58de0
ecx 1
eax 0
trapno e
err 2
eip fe82ca58
cs 158
eflags 10282
uesp fe89ab0d
ss 0
gdt fec1f2f002cf
idt fec1f5c007ff
ldt 140
task 150
cr0 8005003b
cr2 0
cr3 4cb3000
cr4 6d8

14.1.3. Examining the Message Buffer

The kernel keeps a cyclic buffer of the recent kernel messages. In this buffer we can observe the messages up to the time of the panic. The ::msgbuf dcmd shows the contents of the buffer.

> ::msgbuf
MESSAGE
/pseudo/zconsnex@1/zcons@5 (zcons5) online
/pseudo/zconsnex@1/zcons@6 (zcons6) online
/pseudo/zconsnex@1/zcons@7 (zcons7) online
pseudo-device: ramdisk1024
...
panic[cpu0]/thread=d2a58de0:
BAD TRAP: type=e (#pf Page fault) rp=d2a587c8 addr=0 occurred in module "unix" due to a
NULL pointer dereference


sched:
#pf Page fault
Bad kernel fault at addr=0x0
pid=0, pc=0xfe82ca58, sp=0xfe89ab0d, eflags=0x10282
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6d8<xmme,fxsr,pge,mce,pse,de>
cr2: 0 cr3: 4cb3000
gs: fe8301b0 fs: fec30000 es: fe8d0160 ds: d9820160
edi: 0 esi: dc062298 ebp: d2a58828 esp: d2a58800
ebx: de453000 edx: d2a58de0 ecx: 1 eax: 0
trp: e err: 2 eip: fe82ca58 cs: 158
efl: 10282 usp: fe89ab0d ss: 0
...

14.1.4. Obtaining a Stack Trace of the Running Thread

We can obtain a stack backtrace of the current thread by using the $C command. Note that the displayed arguments to each function are not necessarily accurate. On each platform, the meaning of the shown arguments is as follows:

  • SPARC. The values of the arguments if they are available from a saved stack frame, assuming they are not overwritten by use of registers during the called function. With SPARC architectures, a function's input argument registers are sometimes saved on the way out of a functionif the input registers are reused during the function, then values of the input arguments are overwritten and lost.

  • x86. Accurate values of the input arguments. Input arguments are always saved onto the stack and can be accurately displayed

  • x64. The values of the arguments, assuming they are available. As with the SPARC architectures, input arguments are passed in registers and may be overwritten.

> $C
d2a58828 atomic_add_32+8(0)
d2a58854 nfs4_async_inactive+0x3b(dc1c29c0, 0)
d2a58880 nfs4_inactive+0x41()
d2a5889c fop_inactive+0x15(dc1c29c0, 0)
d2a588b0 vn_rele+0x4b(dc1c29c0)
d2a588c0 snf_smap_desbfree+0x59(dda94080)
d2a588dc dblk_lastfree_desb+0x13(de45b520, d826fb40)
d2a588f4 dblk_decref+0x4e(de45b520, d826fb40)
d2a58918 freemsg+0x69(de45b520)
d2a5893c FreeTxSwPacket+0x3b(d38b84f0)
d2a58968 CleanTxInterrupts+0xb4(d2f9cac0)
d2a589a4 e1000g_send+0xf6(d2f9cac0, d9ffba00)
d2a589c0 e1000g_m_tx+0x22()
d2a589dc dls_tx+0x16(d4520f68, d9ffba00)
d2a589f4 str_mdata_fastpath_put+0x1e(d3843f20, d9ffba00)
d2a58a40 tcp_send_data+0x62d(db0ecac0, d97ee250, d9ffba00)
d2a58aac tcp_send+0x6b6(d97ee250, db0ecac0, 564, 28, 14, 0)
d2a58b40 tcp_wput_data+0x622(db0ecac0, 0, 0)
d2a58c28 tcp_rput_data+0x2560(db0ec980, db15bd20, d2d45f40)
d2a58c40 tcp_input+0x3c(db0ec980, db15bd20, d2d45f40)
d2a58c78 squeue_enter_chain+0xe9(d2d45f40, db15bd20, db15bd20, 1, 1)
d2a58cec ip_input+0x658(d990e554, d3164010, 0, e)
d2a58d40 i_dls_link_ether_rx+0x156(d4523db8, d3164010, db15bd20)
d2a58d70 mac_rx+0x56(d3520200, d3164010, db15bd20)
d2a58dac e1000g_intr+0xa6(d2f9cac0, 0)
d2a58ddc intr_thread+0x122()

14.1.5. Which Process?

If the stack trace is of a kernel housekeeping or interrupt thread, the process reported for the thread will be that of p0"sched." The process pointer for the thread can be obtained with ::tHRead, and ::ps will then display summary information about that process. In this example, the thread is an interrupt thread (as indicated by the top entry in the stack from $C), and the process name maps to sched.

> d2a58de0::thread -p
ADDR PROC LWP CRED
d2a58de0 fec1d280 0 d9d1cf38
> fec1d280::ps -t
S PID PPID PGID SID UID FLAGS ADDR NAME
R 0 0 0 0 0 0x00000001 fec1d280 sched
T t0 <TS_STOPPED>

14.1.6. Disassembling the Suspect Code

Once we've located the thread of interest, we often learn more about what happened by disassembling the target and looking at the instruction that reportedly caused the panic. MDB's ::dis dcmd will disassemble the code around the target instruction that we extract from the stack backtrace.

> $C
d2a58828 atomic_add_32+8(0)
d2a58854 nfs4_async_inactive+0x3b(dc1c29c0, 0)
d2a58880 nfs4_inactive+0x41()
d2a5889c fop_inactive+0x15(dc1c29c0, 0)
d2a588b0 vn_rele+0x4b(dc1c29c0)
...
> nfs4_async_inactive+0x3b::dis
nfs4_async_inactive+0x1a: pushl $0x28
nfs4_async_inactive+0x1c: call +0x51faa30 <kmem_alloc>
nfs4_async_inactive+0x21: addl $0x8,%esp
nfs4_async_inactive+0x24: movl %eax,%esi
nfs4_async_inactive+0x26: movl $0x0,(%esi)
nfs4_async_inactive+0x2c: movl -0x4(%ebp),%eax
nfs4_async_inactive+0x2f: movl %eax,0x4(%esi)
nfs4_async_inactive+0x32: movl 0xc(%ebp),%edi
nfs4_async_inactive+0x35: pushl %edi
nfs4_async_inactive+0x36: call +0x51b7cdc <crhold>
nfs4_async_inactive+0x3b: addl $0x4,%esp
nfs4_async_inactive+0x3e: movl %edi,0x8(%esi)
nfs4_async_inactive+0x41: movl $0x4,0xc(%esi)
nfs4_async_inactive+0x48: leal 0xe0(%ebx),%eax
nfs4_async_inactive+0x4e: movl %eax,-0x8(%ebp)
nfs4_async_inactive+0x51: pushl %eax
nfs4_async_inactive+0x52: call +0x51477f4 <mutex_enter>
nfs4_async_inactive+0x57: addl $0x4,%esp
nfs4_async_inactive+0x5a: cmpl $0x0,0xd4(%ebx)
nfs4_async_inactive+0x61: je +0x7e <nfs4_async_inactive+0xdf>
nfs4_async_inactive+0x63: cmpl $0x0,0xd0(%ebx)
> crhold::dis
crhold: pushl %ebp
crhold+1: movl %esp,%ebp
crhold+3: andl $0xfffffff0,%esp
crhold+6: pushl $0x1
crhold+8: movl 0x8(%ebp),%eax
crhold+0xb: pushl %eax
crhold+0xc: call -0x6e0b8 <atomic_add_32>
crhold+0x11: movl %ebp,%esp
crhold+0x13: popl %ebp
crhold+0x14: ret
> atomic_add_32::dis
atomic_add_32: movl 0x4(%esp),%eax
atomic_add_32+4: movl 0x8(%esp),%ecx
atomic_add_32+8: lock addl %ecx,(%eax)
atomic_add_32+0xb: ret

14.1.7. Displaying General-Purpose Registers

In this example, the system had a NULL pointer reference at atomic_add_ 32+8(0). The faulting instruction was atomic, referencing the memory at the location pointed to by %eax. By looking at the registers at the time of the panic, we can see that %eax was indeed NULL. The next step is to attempt to find out why %eax was NULL.

> ::regs
%cs = 0x0158 %eax = 0x00000000
%ds = 0xd9820160 %ebx = 0xde453000
%ss = 0x0000 %ecx = 0x00000001
%es = 0xfe8d0160 %edx = 0xd2a58de0
%fs = 0xfec30000 %esi = 0xdc062298
%gs = 0xfe8301b0 %edi = 0x00000000

%eip = 0xfe82ca58 atomic_add_32+8
%ebp = 0xd2a58828
%esp = 0xd2a58800

%eflags = 0x00010282
id=0 vip=0 vif=0 ac=0 vm=0 rf=1 nt=0 iopl=0x0
status=<of,df,IF,tf,SF,zf,af,pf,cf>

%uesp = 0xfe89ab0d
%trapno = 0xe
%err = 0x2

14.1.8. Navigating the Stack Backtrace

The function prototype for atomic_add_32() reveals that the first argument is a pointer to the memory location to be added. Since this was an x86 machine, the arguments reported by the stack backtrace are known to be useful, and we can look to see where the NULL pointer was handed downin this case nfs4_async_inactive().

void
atomic_add_32(volatile uint32_t *target, int32_t delta)
{
*target += delta;
}


> atomic_add_32::dis
atomic_add_32: movl 0x4(%esp),%eax
atomic_add_32+4: movl 0x8(%esp),%ecx
atomic_add_32+8: lock addl %ecx,(%eax)
atomic_add_32+0xb: ret
> $C
d2a58828 atomic_add_32+8(0)
d2a58854 nfs4_async_inactive+0x3b(dc1c29c0, 0)
d2a58880 nfs4_inactive+0x41()
d2a5889c fop_inactive+0x15(dc1c29c0, 0)
d2a588b0 vn_rele+0x4b(dc1c29c0)
...

> $C
d2a58828 atomic_add_32+8(0)
d2a58854 nfs4_async_inactive+0x3b(dc1c29c0, 0)
d2a58880 nfs4_inactive+0x41()
d2a5889c fop_inactive+0x15(dc1c29c0, 0)
d2a588b0 vn_rele+0x4b(dc1c29c0)
...
> nfs4_async_inactive+0x3b::dis
nfs4_async_inactive+0x1a: pushl $0x28
nfs4_async_inactive+0x1c: call +0x51faa30 <kmem_alloc>
nfs4_async_inactive+0x21: addl $0x8,%esp
nfs4_async_inactive+0x24: movl %eax,%esi
nfs4_async_inactive+0x26: movl $0x0,(%esi)
nfs4_async_inactive+0x2c: movl -0x4(%ebp),%eax
nfs4_async_inactive+0x2f: movl %eax,0x4(%esi)
nfs4_async_inactive+0x32: movl 0xc(%ebp),%edi
nfs4_async_inactive+0x35: pushl %edi
nfs4_async_inactive+0x36: call +0x51b7cdc <crhold>
nfs4_async_inactive+0x3b: addl $0x4,%esp
nfs4_async_inactive+0x3e: movl %edi,0x8(%esi)
nfs4_async_inactive+0x41: movl $0x4,0xc(%esi)
nfs4_async_inactive+0x48: leal 0xe0(%ebx),%eax
nfs4_async_inactive+0x4e: movl %eax,-0x8(%ebp)
nfs4_async_inactive+0x51: pushl %eax
nfs4_async_inactive+0x52: call +0x51477f4 <mutex_enter>
nfs4_async_inactive+0x57: addl $0x4,%esp
nfs4_async_inactive+0x5a: cmpl $0x0,0xd4(%ebx)
nfs4_async_inactive+0x61: je +0x7e <nfs4_async_inactive+0xdf>
nfs4_async_inactive+0x63: cmpl $0x0,0xd0(%ebx)
...

Looking at the disassembly, it appears that there is an additional function call, which is omitted from the stack backtrack (typically due to tail call compiler optimization). The call is to crhold(), passing the address of a credential structure from the arguments to nfs4_async_inactive(). Here we can see that crhold() does in fact call atomic_add_32().

/*
* Put a hold on a cred structure.
*/
void
crhold(cred_t *cr)
{
atomic_add_32(&cr->cr_ref, 1);
}


> crhold::dis
crhold: pushl %ebp
crhold+1: movl %esp,%ebp
crhold+3: andl $0xfffffff0,%esp
crhold+6: pushl $0x1
crhold+8: movl 0x8(%ebp),%eax
crhold+0xb: pushl %eax
crhold+0xc: call -0x6e0b8 <atomic_add_32>
crhold+0x11: movl %ebp,%esp
crhold+0x13: popl %ebp
crhold+0x14: ret

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值