此文不错,mark一下,原文链接如下:
https://blogs.oracle.com/dbx/entry/pstack_coreadm_and_symbol_tables
pstack, coreadm and symbol tables
By Maxim Kartashev on апр 16, 2010
Where symbol names come from?
In ELF files, symbols reside in two sections: .symtab
and .dynsym
.
On recent versions of Solaris, there is a new section,
.SUNW_ldynsym
, but for the purpose of this article it is identical to.dynsym
, so I'll keep it simple and not talk about it.
Both sections are essentially tables that map name to a value; here we are interested in function names, so that value would be function address. When pstack
unwinds the stack (starting from value of $pc
and $fp/$sp
registers that comes from special NOTE
segment of core
file), it goes through symbol tables of all files involved and find symbol with closest value.
For example, suppose we have this core
file:
$ pstack core core 'core' of 7719: ./a.out fece586c strlen (8050ada, 8047a38, fed91c20, 0) + c fed40814 printf (8050ad8, 0) + a8 08050969 ???????? (0, 8047b30, 8047a84, 80508bd, 1, 8047a90) 080509a2 main (1, 8047a90, 8047a98, fed93e40) + 12 080508bd _start (1, 8047b98, 0, 8047ba0, 8047bdc, 8047be7) + 7d
fece586c
address belongs to libc.so.1
as can be seen from pmap(1)
output:
$ pmap core core 'core' of 7719: ./a.out 08046000 8K rwx-- [ stack ] 08050000 4K r-x-- 08060000 4K rwx-- 08061000 128K rwx-- [ heap ] >>>FECC0000 760K r-x-- /lib/libc.so.1 <<< FED8E000 32K rw--- /lib/libc.so.1 FED96000 8K rw--- /lib/libc.so.1 ...
It is in code segment (r-x--
permissions gave that away) of /lib/libc.so.1
.
Looking at libc.so.1
with elfdump
we can see that global function strlen
starts at offset 0x25860
$ elfdump -s /usr/lib/libc.so.1 | grep strlen [2603] 0x00025860 0x00000045 FUNC GLOB D 37 .text strlen
So in our passed away process it would reside at 0xFECC0000
(base address of libc.so.1
in memory) + 0x25860
=0xFECE5860
. Hence 0xfece586c
is 0xFECE5860+0xc
, which is strlen+0xc
Symbol tables
As you can see in the above example, not all symbols were found. In this case, address 0x08050969
was not mapped to any symbol. That address belongs to a.out
code segment starting at 0x08050000
and that's all we can tell. Yet the other symbol from the same segment is visible: main
at 0x080509a2
.
The difference is because those two symbols were present in different symbol tables while executable files are permitted to have only one: .dynsym
(strictly speaking, that probably applies to dynamic executables only, but since Solaris 10 strongly discourages static linking, so we almost always have to deal with dynamic executables and shared libraries). This .dynsym
section is used by run-time linker (ld.so.1(1)
) and contains global names that program "exports" or "imports" from libraries; call to "main" is resolved at run time by looking up name "main" in.dynsym
section and jumping to address associated with symbol found. Since this information is absolutely necessary at run time, .dynsym
section always resides in a loadable segment and is always a part of process memory image (and thus a core
file).
On the other hand, .symtab
section that contains all symbols - including local ones - was useful mostly when linking relocatable object files (\*.o
). References inside one file can be resolved at compile time using offsets, so static functions does not have to have a name at run time, they are called directly using offset from current position. This is why .symtab
section does not belong to a loadable segment and does not contribute to process' memory image in any way. And this is why it [used to be] customary to remove symbol table from final executables (using strip(1)
, for example) to save space and make life of support engineers harder.
In our case, ./a.out
was indeed stripped:
$ elfdump -c a.out | grep symtab $ elfdump -c a.out | grep dynsym Section Header[4]: sh_name: .dynsym
It does have .dynsym
, but no .symtab
. By the way, main
symbol indeed is present in .dynsym
and has address0x08050990
:
$ elfdump -s -N .dynsym a.out | grep main [28] 0x08050990 0x0000001a FUNC GLOB D 0 .text main
Loadable objects (executables and shared libraries)
Let's recompile a.out
and see how it helps:
$ CC510 a.cc $ ./a.out Segmentation Fault (core dumped) $ pstack core core 'core' of 11761: ./a.out fece586c strlen (8050ada, 8047a38, fed91c20, 0) + c fed40814 printf (8050ad8, 0) + a8 08050969 __1cDfoo6F_i_ (0, 8047b30, 8047a84, 80508bd, 1, 8047a90) + 19 080509a2 main (1, 8047a90, 8047a98, fed93e40) + 12 080508bd _start (1, 8047b98, 0, 8047ba0, 8047bdc, 8047be7) + 7d
We now can see name __1cDfoo6F_i_
(mangled name of int foo()
) instead of ???
, but where would pstack
get this information? __1cDfoo6F_i_
is not present in .dynsym
, so it there was not information about this name in memory image of the process when it died:
$ strings core | grep __1cDfoo6F_i_
pstack(1)
is smarter that that: it finds out which program generated this core
file, locates it and uses its .symtab
(if present, of course) to map symbols. Here's an excerpt from proc(1)
:
Some of the proc tools can need to derive the name of the executable corresponding to the process which dumped core or the names of shared libraries associated with the process. These files are needed, for example, to provide symbol table information for pstack(1). If the proc tool in question is unable to locate the needed executable or shared library, some symbol information is unavailable for display.
Let's delete a.out
and see what happens:
$ rm a.out $ pstack core core 'core' of 11761: ./a.out fece586c strlen (8050ada, 8047a38, fed91c20, 0) + c fed40814 printf (8050ad8, 0) + a8 08050969 ???????? (0, 8047b30, 8047a84, 80508bd, 1, 8047a90) 080509a2 main (1, 8047a90, 8047a98, fed93e40) + 12 080508bd _start (1, 8047b98, 0, 8047ba0, 8047bdc, 8047be7) + 7d
We immediately got our ???'
s back.
So pstack
uses core file and executable/libraries as well in order to print readable names in stack trace.
Core file contents
If you have to send your core file to another person for inspection, you have him at a disadvantage: that person might not have your executable and even system libraries might be slightly different. If pstack would go look for address-to-symbol mapping there, it might end up printing wrong symbol names and question marks, making core file more harmful than helpful.
There is a way to embed symbol tables into the core
file - using coreadm(1M)
command. It allows to specify what kind of content you want the system to put into core
file and it can even direct the system to pull .symtab
from executable and shared libraries:
$ coreadm -I default+symtab(do this under
root
).
More information on coreadm
can be found in its man page: coreadm(1M)
.
Side note: in fact, symbol tables of
libc.so.1
andld.so.1
were present in my core file even without "symtab" content requested as can be seen byelfdump -c core
; seems to be an undocumented, but useful feature.
Let's turn .symtab
inclusion on and see how if it helps:
$ su - # coreadm -I default+symtab # exit $ ./a.out Segmentation Fault (core dumped) $ rm a.out $ pstack core core 'core' of 13604: ./a.out fece586c strlen (8050ada, 8047a38, fed91c20, 0) + c fed40814 printf (8050ad8, 0) + a8 08050969 __1cDfoo6F_i_ (0, 8047b30, 8047a84, 80508bd, 1, 8047a90) + 19 080509a2 main (1, 8047a90, 8047a98, fed93e40) + 12 080508bd _start (1, 8047b98, 0, 8047ba0, 8047bdc, 8047be7) + 7d
Core file now contains many symbol tables, one per loadobject:
$ elfdump -c core | grep symtab Section Header[1]: sh_name: .symtab Section Header[3]: sh_name: .symtab Section Header[6]: sh_name: .symtab Section Header[8]: sh_name: .symtab Section Header[10]: sh_name: .symtab Section Header[12]: sh_name: .symtab
and one of them has definition of our int foo()
function that starts at 0x08050950
:
$ elfdump -s core | grep foo [56] 0x08050950 0x00000034 FUNC LOCL D 0 __1cDfoo6F_i_
How to prevent ??? to appear in stack trace?
Use pstack on the same machine
First and foremost, you can avoid many problems by first using pstack
on the same machine where core
file was generated. This will ensure that pstack
uses the same binary and libraries as the process that generated core. Otherwise, you might end up looking at wrong symbols or (in the best case) a lot of question marks.
Don't strip binaries
In Solaris, it is no longer customary to strip binaries (seehttp://blogs.sun.com/ali/entry/which_solaris_files_are_stripped). Space savings are questionable and performance of unstripped binary does not suffer, so why having lives of those who will debug it difficult?
Don't delete binaries
By default, Solaris does not include .symtab
into core
files (except for libc.so
and ld.so
as I mentioned earlier, but that is not relevant here when we talk about user executables and libraries). So if you delete or move executable/library after core
file was generated, pstack
won't be able to find its .symtab
and thus map addresses to local function names.
In other words, unless you've changed core file contents with coreadm(1M), don't delete your binaries before you have a chance to inspect core file. They are needed.
Use coreadm
Most of problems above can be eliminated with one blow:
# coreadm -I default+symtab
This tells the system to pull .symtab
sections from every binary involved in the process and put them into core file. You no longer need binaries to see names instead of numbers in stack trace.