问题:
tieba和favo程序在启动时出core,位置在qsort,(signal 8),算术错
定位:
栈的结构如下:
#0 0x4202a801 in qsort () from /lib/i686/libc.so.6
#1 0x0804e74e in getFinalBSRes (databuf=0x406697fc) at frasbs.cpp:128
#2 0x0804ea74 in adjustDiffBSResOrder (databuf=0x406697fc) at frasbs.cpp:233
#3 0x0804e553 in BSSearch (databuf=0x406697fc) at frasbs.cpp:66
#4 0x0804e4b8 in getBSResponse (databuf=0x406697fc) at frasbs.cpp:29
#5 0x0804c559 in getResponse (databuf=0x406697fc) at fras.cpp:283
#6 0x0804c292 in thread_main (arg=0x1) at fras.cpp:222
#7 0x40020941 in pthread_start_thread () from /lib/i686/libpthread.so.0
查看出core指令
(gdb) x /i $eip 0x4202a801 <qsort+65>: divl 0x724(%ebx)
除法指令出问题,一般觉得会是/0之类的错误
查了一下qsort原代码,注意版本问题,要下相同的版本,因为不同机器上的glibc好像还不同
[rd@tc-forum-se01 forum]$ ldd ./bin/fras
libpthread.so.0 => /lib/i686/libpthread.so.0 (0x4001a000)
libcrypto.so.2 => /lib/libcrypto.so.2 (0x4004b000) libc.so.6 => /lib/i686/libc.so.6 (0x42000000)
libdl.so.2 => /lib/libdl.so.2 (0x4011f000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
/lib/i686/libc.so.6 (0x42000000) libdl.so.2 => /lib/libdl.so.2 (0x4011f000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
[rd@tc-forum-se01 forum]$ /lib/i686/libc.so.6 -v GNU C Library development release version 2.2.93 , by Roland McGrath? et al. Copyright (C) 1992-2001, 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 3.2 20020903 (Red Hat Linux 8.0 3.2-7). Compiled on a Linux 2.4.9-9 system on 2002-09-05. Available extensions:
下了一个glibc 2.2.93版本
qsort的实现如下,只列出一部分 void
qsort (void *b, size_t n, size_t s, __compar_fn_t cmp)
{
const size_t size = n * s;
if (size < 1024)
{
void *buf = __alloca (size);
/* The temporary array is small, so put it on the stack. */
msort_with_tmp (b, n, s, cmp, buf);
}
else
{
/* We should avoid allocating too much memory since this might
have to be backed up by swap space. */
static long int phys_pages;
static int pagesize;
if (phys_pages == 0)
{
phys_pages = __sysconf (_SC_PHYS_PAGES);
if (phys_pages == -1)
/* Error while determining the memory size. So let's
assume there is enough memory. Otherwise the
implementer should provide a complete implementation of
the `sysconf' function. */
phys_pages = (long int) (~0ul >> 1);
/* The following determines that we will never use more than
a quarter of the physical memory. */
phys_pages /= 4;
pagesize = __sysconf (_SC_PAGESIZE);
}
/* Just a comment here. We cannot compute
phys_pages * pagesize
and compare the needed amount of memory against this value.
The problem is that some systems might have more physical
memory then can be represented with a `size_t' value (when
measured in bytes. */
/* If the memory requirements are too high don't allocate memory. */
if (size / pagesize > phys_pages)
_quicksort (b, n, s, cmp);
else
{
根据disassemble,可以看到出问题的指令刚好对应代码中黑体部分
接下来的问题就简单了程序中有两个static变量 static long int phys_pages; static int pagesize;
在多线程情况下在pagesize==0时,是有可能走到上面的分支的,所以就sigal 8了。。。
解决:
*可以在主程序(线程未启动时)中先调用qsort一次,为static变量赋上值。
*对其它C函数可以做一下类似的检查,如果有类似问题,可以用上面说的方案解决。
*需要注意的时:初始化时,n*s必须大于1024,否则pagesize根本没进行初始化。