使用mdb调试Solaris内核crash文件

最新推荐文章于 2023-03-09 07:15:00 发布

opendba

最新推荐文章于 2023-03-09 07:15:00 发布

阅读量1.3k

点赞数

分类专栏： 71.solaris 文章标签： crash solaris thread input function c

71.solaris 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

转自：http://blog.csdn.net/hotsolaris/archive/2008/03/10/2163306.aspx

Working with Kernel Cores

The most common type of kernel debug target is a core file, saved from a prior system crash. In the following sections, we highlight some of the introductory steps as used with mdb to explore a kernel core image.

14.1.1. Locating and Attaching the Target

If a system has crashed, then we should have a core image saved in /var/crash on the target machine. The mdb debugger should be invoked from a system with the same architecture and Solaris revision as the crash image. The first steps are to locate the appropriate saved image and then to invoke mdb.

#  cd /var/crash/nodename

# ls
bounds    unix.1    unix.3    unix.5    unix.7    vmcore.1  vmcore.3  vmcore.5  vmcore.7
unix.0    unix.2    unix.4    unix.6    vmcore.0  vmcore.2  vmcore.4  vmcore.6

#  mdb -k unix.7 vmcore.7
Loading modules: [ unix krtld$c
 genunix specfs dtrace ufs ip sctp usba uhci s1394 fcp fctl nca lofs zfs random nfs
audiosup sppp crypto md fcip logindmux ptm ipc ]
>

14.1.2. Examining Kernel Core Summary Information

The kernel core contains important summary information from which we can extract the following:

Revision of the kernel
Hostname
CPU and platform architecture of the system
Panic string
Module causing the panic

We can use the ::showrev and ::status dcmds to extract this information.

> ::showrev
Hostname: zones-internal
Release: 5.11
Kernel architecture: i86pc
Application architecture: i386
Kernel version: SunOS 5.11 i86pc snv_27
Platform: i86pc
> ::status
debugging crash dump vmcore.2 (32-bit) from zones-internal
operating system: 5.11 snv_27 (i86pc)
panic message: BAD TRAP: type=e (#pf Page fault) rp=d2a587c8 addr=0 occurred in module
"unix" due to a NULL pointer dereference
dump content: kernel pages only
> ::panicinfo
             cpu         0
          thread d2a58de0
         message BAD TRAP: type=e (#pf Page fault) rp=d2a587c8 addr=0 occurred in module
"unix" due to a NULL pointer dereference
              gs fe8301b0
              fs fec30000
              es fe8d0160
              ds d9820160
             edi        0
             esi dc062298
             ebp d2a58828
             esp d2a58800
             ebx de453000
             edx d2a58de0
             ecx        1
             eax        0
          trapno        e
             err        2
             eip fe82ca58
              cs      158
          eflags    10282
            uesp fe89ab0d
             ss         0
             gdt fec1f2f002cf
             idt fec1f5c007ff
             ldt      140
            task      150
             cr0 8005003b
             cr2        0
             cr3  4cb3000
             cr4      6d8

14.1.3. Examining the Message Buffer

The kernel keeps a cyclic buffer of the recent kernel messages. In this buffer we can observe the messages up to the time of the panic. The ::msgbuf dcmd shows the contents of the buffer.

> ::msgbuf
MESSAGE
/pseudo/zconsnex@1/zcons@5 (zcons5) online
/pseudo/zconsnex@1/zcons@6 (zcons6) online
/pseudo/zconsnex@1/zcons@7 (zcons7) online
pseudo-device: ramdisk1024
...
panic[cpu0]/thread=d2a58de0:
BAD TRAP: type=e (#pf Page fault) rp=d2a587c8 addr=0 occurred in module "unix" due to a
NULL pointer dereference


sched:
#pf Page fault
Bad kernel fault at addr=0x0
pid=0, pc=0xfe82ca58, sp=0xfe89ab0d, eflags=0x10282
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6d8<xmme,fxsr,pge,mce,pse,de>
cr2: 0 cr3: 4cb3000
         gs: fe8301b0  fs: fec30000  es: fe8d0160  ds: d9820160
        edi:        0 esi: dc062298 ebp: d2a58828 esp: d2a58800
        ebx: de453000 edx: d2a58de0 ecx:        1 eax:        0
        trp:        e err:        2 eip: fe82ca58  cs:      158
        efl:    10282 usp: fe89ab0d  ss:        0
...

14.1.4. Obtaining a Stack Trace of the Running Thread

We can obtain a stack backtrace of the current thread by using the $C command. Note that the displayed arguments to each function are not necessarily accurate. On each platform, the meaning of the shown arguments is as follows:

SPARC. The values of the arguments if they are available from a saved stack frame, assuming they are not overwritten by use of registers during the called function. With SPARC architectures, a function's input argument registers are sometimes saved on the way out of a functionif the input registers are reused during the function, then values of the input arguments are overwritten and lost.
x86. Accurate values of the input arguments. Input arguments are always saved onto the stack and can be accurately displayed
x64. The values of the arguments, assuming they are available. As with the SPARC architectures, input arguments are passed in registers and may be overwritten.

> $C
d2a58828 atomic_add_32+8(0)
d2a58854 nfs4_async_inactive+0x3b(dc1c29c0, 0)
d2a58880 nfs4_inactive+0x41()
d2a5889c fop_inactive+0x15(dc1c29c0, 0)
d2a588b0 vn_rele+0x4b(dc1c29c0)
d2a588c0 snf_smap_desbfree+0x59(dda94080)
d2a588dc dblk_lastfree_desb+0x13(de45b520, d826fb40)
d2a588f4 dblk_decref+0x4e(de45b520, d826fb40)
d2a58918 freemsg+0x69(de45b520)
d2a5893c FreeTxSwPacket+0x3b(d38b84f0)
d2a58968 CleanTxInterrupts+0xb4(d2f9cac0)
d2a589a4 e1000g_send+0xf6(d2f9cac0, d9ffba00)
d2a589c0 e1000g_m_tx+0x22()
d2a589dc dls_tx+0x16(d4520f68, d9ffba00)
d2a589f4 str_mdata_fastpath_put+0x1e(d3843f20, d9ffba00)
d2a58a40 tcp_send_data+0x62d(db0ecac0, d97ee250, d9ffba00)
d2a58aac tcp_send+0x6b6(d97ee250, db0ecac0, 564, 28, 14, 0)
d2a58b40 tcp_wput_data+0x622(db0ecac0, 0, 0)
d2a58c28 tcp_rput_data+0x2560(db0ec980, db15bd20, d2d45f40)
d2a58c40 tcp_input+0x3c(db0ec980, db15bd20, d2d45f40)
d2a58c78 squeue_enter_chain+0xe9(d2d45f40, db15bd20, db15bd20, 1, 1)
d2a58cec ip_input+0x658(d990e554, d3164010, 0, e)
d2a58d40 i_dls_link_ether_rx+0x156(d4523db8, d3164010, db15bd20)
d2a58d70 mac_rx+0x56(d3520200, d3164010, db15bd20)
d2a58dac e1000g_intr+0xa6(d2f9cac0, 0)
d2a58ddc intr_thread+0x122()

14.1.5. Which Process?

If the stack trace is of a kernel housekeeping or interrupt thread, the process reported for the thread will be that of p0"sched." The process pointer for the thread can be obtained with ::tHRead, and ::ps will then display summary information about that process. In this example, the thread is an interrupt thread (as indicated by the top entry in the stack from $C), and the process name maps to sched.

> d2a58de0::thread -p
    ADDR     PROC      LWP      CRED
d2a58de0 fec1d280        0 d9d1cf38
> fec1d280::ps -t
S    PID   PPID   PGID    SID    UID      FLAGS     ADDR NAME
R      0      0      0      0      0 0x00000001 fec1d280 sched
        T        t0 <TS_STOPPED>

14.1.6. Disassembling the Suspect Code

Once we've located the thread of interest, we often learn more about what happened by disassembling the target and looking at the instruction that reportedly caused the panic. MDB's ::dis dcmd will disassemble the code around the target instruction that we extract from the stack backtrace.

> $C
d2a58828 atomic_add_32+8(0)
d2a58854 nfs4_async_inactive+0x3b(dc1c29c0, 0)
d2a58880 nfs4_inactive+0x41()
d2a5889c fop_inactive+0x15(dc1c29c0, 0)
d2a588b0 vn_rele+0x4b(dc1c29c0)
...
> nfs4_async_inactive+0x3b::dis
nfs4_async_inactive+0x1a:       pushl  $0x28
nfs4_async_inactive+0x1c:       call   +0x51faa30       <kmem_alloc>
nfs4_async_inactive+0x21:       addl   $0x8,%esp
nfs4_async_inactive+0x24:       movl   %eax,%esi
nfs4_async_inactive+0x26:       movl   $0x0,(%esi)
nfs4_async_inactive+0x2c:       movl   -0x4(%ebp),%eax
nfs4_async_inactive+0x2f:       movl   %eax,0x4(%esi)
nfs4_async_inactive+0x32:       movl   0xc(%ebp),%edi
nfs4_async_inactive+0x35:       pushl  %edi
nfs4_async_inactive+0x36:       call   +0x51b7cdc       <crhold>
nfs4_async_inactive+0x3b:       addl   $0x4,%esp
nfs4_async_inactive+0x3e:       movl   %edi,0x8(%esi)
nfs4_async_inactive+0x41:       movl   $0x4,0xc(%esi)
nfs4_async_inactive+0x48:       leal   0xe0(%ebx),%eax
nfs4_async_inactive+0x4e:       movl   %eax,-0x8(%ebp)
nfs4_async_inactive+0x51:       pushl  %eax
nfs4_async_inactive+0x52:       call   +0x51477f4       <mutex_enter>
nfs4_async_inactive+0x57:       addl   $0x4,%esp
nfs4_async_inactive+0x5a:       cmpl   $0x0,0xd4(%ebx)
nfs4_async_inactive+0x61:       je     +0x7e    <nfs4_async_inactive+0xdf>
nfs4_async_inactive+0x63:       cmpl   $0x0,0xd0(%ebx)
> crhold::dis
crhold:                         pushl  %ebp
crhold+1:                       movl   %esp,%ebp
crhold+3:                       andl   $0xfffffff0,%esp
crhold+6:                       pushl  $0x1
crhold+8:                       movl   0x8(%ebp),%eax
crhold+0xb:                     pushl  %eax
crhold+0xc:                     call   -0x6e0b8 <atomic_add_32>
crhold+0x11:                    movl   %ebp,%esp
crhold+0x13:                    popl   %ebp
crhold+0x14:                    ret
> atomic_add_32::dis
atomic_add_32:                  movl   0x4(%esp),%eax
atomic_add_32+4:                movl   0x8(%esp),%ecx
atomic_add_32+8:                lock   addl %ecx,(%eax)
atomic_add_32+0xb:              ret

14.1.7. Displaying General-Purpose Registers

In this example, the system had a NULL pointer reference at atomic_add_ 32+8(0). The faulting instruction was atomic, referencing the memory at the location pointed to by %eax. By looking at the registers at the time of the panic, we can see that %eax was indeed NULL. The next step is to attempt to find out why %eax was NULL.

> ::regs
%cs = 0x0158             %eax = 0x00000000
%ds = 0xd9820160                 %ebx = 0xde453000
%ss = 0x0000             %ecx = 0x00000001
%es = 0xfe8d0160                 %edx = 0xd2a58de0
%fs = 0xfec30000                 %esi = 0xdc062298
%gs = 0xfe8301b0                 %edi = 0x00000000

%eip = 0xfe82ca58 atomic_add_32+8
%ebp = 0xd2a58828
%esp = 0xd2a58800

%eflags = 0x00010282
  id=0 vip=0 vif=0 ac=0 vm=0 rf=1 nt=0 iopl=0x0
  status=<of,df,IF,tf,SF,zf,af,pf,cf>

  %uesp = 0xfe89ab0d
%trapno = 0xe
   %err = 0x2

14.1.8. Navigating the Stack Backtrace

The function prototype for atomic_add_32() reveals that the first argument is a pointer to the memory location to be added. Since this was an x86 machine, the arguments reported by the stack backtrace are known to be useful, and we can look to see where the NULL pointer was handed downin this case nfs4_async_inactive().

void
atomic_add_32(volatile uint32_t *target, int32_t delta)
{
        *target += delta;
}


> atomic_add_32::dis
atomic_add_32:                  movl    0x4(%esp),%eax
atomic_add_32+4:                movl    0x8(%esp),%ecx
atomic_add_32+8:                lock addl %ecx,(%eax)
atomic_add_32+0xb:              ret
> $C
d2a58828 atomic_add_32+8(0)
d2a58854 nfs4_async_inactive+0x3b(dc1c29c0, 0)
d2a58880 nfs4_inactive+0x41()
d2a5889c fop_inactive+0x15(dc1c29c0, 0)
d2a588b0 vn_rele+0x4b(dc1c29c0)
...

> $C
d2a58828 atomic_add_32+8(0)
d2a58854 nfs4_async_inactive+0x3b(dc1c29c0, 0)
d2a58880 nfs4_inactive+0x41()
d2a5889c fop_inactive+0x15(dc1c29c0, 0)
d2a588b0 vn_rele+0x4b(dc1c29c0)
...
> nfs4_async_inactive+0x3b::dis
nfs4_async_inactive+0x1a:       pushl  $0x28
nfs4_async_inactive+0x1c:       call   +0x51faa30       <kmem_alloc>
nfs4_async_inactive+0x21:       addl   $0x8,%esp
nfs4_async_inactive+0x24:       movl   %eax,%esi
nfs4_async_inactive+0x26:       movl   $0x0,(%esi)
nfs4_async_inactive+0x2c:       movl   -0x4(%ebp),%eax
nfs4_async_inactive+0x2f:       movl   %eax,0x4(%esi)
nfs4_async_inactive+0x32:       movl   0xc(%ebp),%edi
nfs4_async_inactive+0x35:       pushl  %edi
nfs4_async_inactive+0x36:       call   +0x51b7cdc       <crhold>
nfs4_async_inactive+0x3b:       addl   $0x4,%esp
nfs4_async_inactive+0x3e:       movl   %edi,0x8(%esi)
nfs4_async_inactive+0x41:       movl   $0x4,0xc(%esi)
nfs4_async_inactive+0x48:       leal   0xe0(%ebx),%eax
nfs4_async_inactive+0x4e:       movl   %eax,-0x8(%ebp)
nfs4_async_inactive+0x51:       pushl  %eax
nfs4_async_inactive+0x52:       call   +0x51477f4       <mutex_enter>
nfs4_async_inactive+0x57:       addl   $0x4,%esp
nfs4_async_inactive+0x5a:       cmpl   $0x0,0xd4(%ebx)
nfs4_async_inactive+0x61:       je     +0x7e    <nfs4_async_inactive+0xdf>
nfs4_async_inactive+0x63:       cmpl   $0x0,0xd0(%ebx)
...

Looking at the disassembly, it appears that there is an additional function call, which is omitted from the stack backtrack (typically due to tail call compiler optimization). The call is to crhold(), passing the address of a credential structure from the arguments to nfs4_async_inactive(). Here we can see that crhold() does in fact call atomic_add_32().

/*
 * Put a hold on a cred structure.
 */
void
crhold(cred_t *cr)
{
        atomic_add_32(&cr->cr_ref, 1);
}


> crhold::dis
crhold:                         pushl  %ebp
crhold+1:                       movl   %esp,%ebp
crhold+3:                       andl   $0xfffffff0,%esp
crhold+6:                       pushl  $0x1
crhold+8:                       movl   0x8(%ebp),%eax
crhold+0xb:                     pushl  %eax
crhold+0xc:                     call   -0x6e0b8 <atomic_add_32>
crhold+0x11:                    movl   %ebp,%esp
crhold+0x13:                    popl   %ebp
crhold+0x14:                    ret