This article explains how to analyse core dump on AIX with dbx.
We have two problems with Totalview.
- The display of the Visibroker stack trace is not relevant.
- It is not possible to change with TV the path to the loaded libraries used for the core dump analysis.
This article explains
- How to start dbx ?
- How to change the path to the loaded libraries with dbx ?
- A small set of commands for dbx.
- Where to find the documentation for dbx.
1/ How to start dbx ?
Simply by "dbx ./myProgram ./myCoreFile"
2/ How to change the path to the loaded libraries with dbx ?
On AIX, it is possible to use "-p"
-p oldpath=newpath:...| pathfile
Specifies a substitution for library paths when examining core files in the format oldpath=newpath. oldpath specifies the value to be substituted (as stored in the core file) and newpath specifies what it is to be replaced with. These may be complete or partial, relative or absolute paths. Multiple substitutions may be specified, separated by colons. Alternatively, the -p flag may specify the name of a file from which mappings in the previously described format are to be read. Only one mapping per line is allowed when mappings are read from a file.
Example : dbx -p /soft=/users/username ./myProgram ./myCoreFile
3/ A small set of command for dbx
corefile
Displays high-level data about a corefile.
where
stack trace (defaults to faulting thread)
proc
traits of the process when it coredumped
thread
pthreads data
kthread
information about kernel threads
fd
file descriptors at the time of the dump
map
shows which modules were loaded at time of dump
help
help informations
up
go up for one level in the stack
down
go down one level in the stack
print
display the contents of one variable
example :
> dbx DEBUG/server_d core_spare
Type 'help' for help.
[using memory image in core_spare]
reading symbolic information ...warning: sep_version_info.cxx is newer than /users/username/xyz/shlib/libProcessAdapter_ss_d.so
Segmentation fault in _event_sleep at 0x9000000001677dc ($t14)
0x9000000001677dc (_event_sleep+0x108) e8410028 ld r2,0x28(r1)
(dbx) corefile
Process Name: DEBUG/server_d
Version: 500
Flags: FULL_CORE | CORE_VERSION_1 | MSTS_VALID | UBLOCK_VALID | USTACK_VALID | LE_VALID
Signal: SEGV
Process Mode: 64 bit
(dbx) thread
thread state-k wchan state-u k-tid mode held scope function
$t1 run running 2236513 u no pro _p_nsleep
$t2 run blocked 4874355 u no pro _event_sleep
$t3 run blocked 4923583 u no pro _event_sleep
$t4 run running 5263371 u no pro poll
$t5 run running 2875597 u no pro poll
$t6 run blocked u no pro _usched_swtch
$t7 run running 3203225 u no pro __fd_select
$t8 run blocked u no pro _usched_swtch
>$t14 run blocked 5316739 k no pro _event_sleep
$t10 run blocked 1404935 u no pro _event_sleep
$t11 run blocked 1835179 u no pro _event_sleep
$t12 run blocked 4440309 u no pro _event_sleep
$t13 run running 1790123 u no pro poll
$t15 run terminated u no pro
(dbx) where
_event_sleep(??, ??, ??, ??, ??, ??) at 0x9000000001677dc
_p_sigtimedwait(??, ??, ??) at 0x90000000016c7a0
pth_signal.sigwait(??, ??) at 0x90000000016d7e4
unnamed block in SignalHandlerImpl(void*)(0x9001000a1c671f8), line 96 in "SigHandler_AIX.cxx"
SignalHandlerImpl(void*)(0x9001000a1c671f8), line 96 in "SigHandler_AIX.cxx"
unnamed block in invoke_i()(0x1100c3e50), line 150 in "Thread_Adapter.cpp"
invoke_i()(0x1100c3e50), line 150 in "Thread_Adapter.cpp"
invoke()(0x1100c3e50), line 94 in "Thread_Adapter.cpp"
ace_thread_adapter(0x1100c3e50), line 132 in "Base_Thread_Adapter.cpp"
(dbx) thread current 1
(dbx) where
_p_nsleep(??, ??) at 0x90000000016cd58
raise.nsleep(??, ??) at 0x9000000002cb49c
nanosleep(??, ??) at 0x9000000002faabc
OS_NS_unistd.sleep(unsigned int)(0x493e0000493e0), line 1093 in "OS_NS_unistd.inl"
unnamed block in run()(0x110002850), line 42 in "server.cxx"
run()(0x110002850), line 42 in "server.cxx"
run()(0x11009c310), line 690 in "ProcessGuts.cxx"
unnamed block in processMain(int,char**)(0x11009c310, 0x300000003, 0xfffffffffffeca0), line 1058 in "ProcessGuts.cxx"
processMain(int,char**)(0x11009c310, 0x300000003, 0xfffffffffffeca0), line 1058 in "ProcessGuts.cxx"
main2(int,char**)(argc = 3, argv = 0x0fffffffffffeca0), line 80 in "main.cxx"
main(argc = 3, argv = 0x0fffffffffffeca0), line 90 in "main.cxx"
(dbx) thread
thread state-k wchan state-u k-tid mode held scope function
>$t1 run running 2236513 u no pro _p_nsleep
$t2 run blocked 4874355 u no pro _event_sleep
$t3 run blocked 4923583 u no pro _event_sleep
$t4 run running 5263371 u no pro poll
$t5 run running 2875597 u no pro poll
$t6 run blocked u no pro _usched_swtch
$t7 run running 3203225 u no pro __fd_select
$t8 run blocked u no pro _usched_swtch
*$t14 run blocked 5316739 k no pro _event_sleep
$t10 run blocked 1404935 u no pro _event_sleep
$t11 run blocked 1835179 u no pro _event_sleep
$t12 run blocked 4440309 u no pro _event_sleep
$t13 run running 1790123 u no pro poll
$t15 run terminated u no pro
(dbx) up
ProcessGuts.append(const char*,unsigned long)(0x110eda470, 0x1102a0930, 0x24), line 1062 in "cstring.h"
(dbx) print xstr
Object:(_guts = (nil))
SEPCString:(data_ = "/hedevecs01/SEPxxx/Main ")
()
(dbx)
4/ If you have trouble to display the stack
In case you have the following message :
warning: cannot open /soft/nsmsoft/nsm1/CCTServer/current/shlib/5/libSEP_ss.so(libSEP_ss.o)
Please use the -p option (point 2 of this article)
5/ Where to find more explaination for dbx
Simply "man dbx" on the machine.
6/ Automate core analysis
a) Install application that crashed, with all the required libraries:
helabct05-operator% ls -l /soft/nsmsoft/nsm2/epm_light/2.0STD9-A5.3/*
/soft/nsmsoft/nsm2/epm_light/2.0STD9-A5.3/bin:
-rwxr--r-- 1 operator users 613564 May 30 10:08 EPMLight
/soft/nsmsoft/nsm2/epm_light/2.0STD9-A5.3/param:
-rw-r--r-- 1 operator users 55866955 Nov 23 11:58 core
/soft/nsmsoft/nsm2/epm_light/2.0STD9-A5.3/shlib:
lrwxrwxrwx 1 operator users 8 Nov 23 11:56 1 -> ../shlib
lrwxrwxrwx 1 operator users 24 Nov 23 11:56 2 -> /soft/nsmsoft/nsm2/shlib
lrwxrwxrwx 1 operator users 1 Nov 23 11:56 3 -> .
lrwxrwxrwx 1 operator users 45 Nov 23 11:56 4 -> /soft/local/common/sep/2.4STD13-A5.3/vb_shlib
lrwxrwxrwx 1 operator users 42 Nov 23 11:56 5 -> /soft/local/common/sep/2.4STD13-A5.3/shlib
lrwxrwxrwx 1 operator users 40 Nov 23 11:56 6 -> /soft/local/common/osagent/current/shlib
lrwxrwxrwx 1 operator users 8 Nov 23 11:59 7 -> /usr/lib
-rw-r--r-- 1 operator users 51975 May 30 10:08 libMOB_EPMEvents.so
-rw-r--r-- 1 operator users 317046 May 30 10:08 libMOB_EPM_ss.so
b) Go into param directory and start dbx to find out how many threads were active at crash time
helabct05-operator% dbx ../bin/EPMLight core
(dbx) thread
... --> lists the threads
(dbx) quit
c) Generate a dbx script for displaying every thread in the core
#!/bin/ksh
if [ -f SCRIPT ]
then
rm SCRIPT
fi
touch SCRIPT
I=1
while [ $I -lt 95 ]
do
echo "print /"/"" >> SCRIPT
echo "print /"THREAD $I/"" >> SCRIPT
echo "thread current $I" >> SCRIPT
echo "where" >> SCRIPT
let I=$I+1
done
d) Execute dbx script and collect result in flat file
helabct05-operator% dbx -c SCRIPT ../bin/EPMLight core > RESULT
(dbx) quit
This generated a file called RESULT containing the stack of every thread in EPMLight at crash time