tcmalloc 导致fork后,子进程死锁

        在多线程启动子进程的过程中,发生了子进程启动失败,卡在了fork后并未执行exe函数退出,而是卡在了一个中间态,出现了一个现象,即,环境中出现了一个和父进程同名的进程。

        查看子进程的堆栈如下:

#0  0x00007ff1d1ae6c83 in sys_futex (t=0x7ff1ac6fd1b0, v=2, o=128, a=0x7ff1d1cfdec0 <tcmalloc::Static::central_cache_+8512>) at ./src/base/linux_syscall_support.h:1787
#1  base::internal::SpinLockDelay (w=w@entry=0x7ff1d1cfdec0 <tcmalloc::Static::central_cache_+8512>, value=2, loop=loop@entry=6350) at ./src/base/spinlock_linux-inl.h:87
#2  0x00007ff1d1ae6ed7 in SpinLock::SlowLock (this=this@entry=0x7ff1d1cfdec0 <tcmalloc::Static::central_cache_+8512>) at src/base/spinlock.cc:132
#3  0x00007ff1d1ae0630 in Lock (this=0x7ff1d1cfdec0 <tcmalloc::Static::central_cache_+8512>) at src/base/spinlock.h:75
#4  tcmalloc::CentralFreeList::RemoveRange (this=0x7ff1d1cfdec0 <tcmalloc::Static::central_cache_+8512>, start=start@entry=0x7ff1ac6fd290, end=end@entry=0x7ff1ac6fd298, N=24) at src/central_freelist.cc:247
#5  0x00007ff1d1ae32f3 in tcmalloc::ThreadCache::FetchFromCentralCache (this=0x13badc0, cl=<optimized out>, byte_size=96) at src/thread_cache.cc:162
#6  0x00007ff1d1aecaf8 in Allocate (cl=<optimized out>, size=<optimized out>, this=<optimized out>) at src/thread_cache.h:341
#7  do_malloc (size=<optimized out>) at src/tcmalloc.cc:1068
#8  cpp_alloc (nothrow=false, size=88) at src/tcmalloc.cc:1354
#9  tc_newarray (size=88) at src/tcmalloc.cc:1560
#10 0x00007ff1d33d6840 in Process::launch(char const*, int, char**) () 

        卡在了内存管理库tcmalloc的锁上。

        查看tcmalloc代码,如下:

int CentralFreeList::RemoveRange(void **start, void **end, int N) {
  ASSERT(N > 0);
  lock_.Lock();
  if (N == Static::sizemap()->num_objects_to_move(size_class_) &&
      used_slots_ > 0) {
    int slot = --used_slots_;
    ASSERT(slot >= 0);
    TCEntry *entry = &tc_slots_[slot];
    *start = entry->head;
    *end = entry->tail;
    lock_.Unlock();
    return N;
  }

        只要申请堆内存,就可能走到这个流程中去,那么是不是我们在fork后,在子进程中有过new的操作,查看代码如下:

Process Process::launch(const char* cmdline,int argc,char * argv[])
{
	Process p;
	int pid = fork();
	if(pid>0){
		p.pid = pid;
		return p;
	}
	else if(pid==0){		
		char ** args = new char*[argc+2];
		args[0] = (char*)cmdline;
		args[argc+1] = NULL;
		for(int i(0);i<argc;++i) {
			args[i+1] = argv[i];
		}
		execvp(cmdline,args);
		delete [] args;
		exit(0);
	}
	return p;
}

        这样流程会卡在new操作符中,无法执行后续的execvp,导致子进程死锁的现象。

        

搜索tcmalloc的官方issues,确实存在过相同的问题:

Originally reported on Google Code with ID 496

What steps will reproduce the problem?
Use tcmalloc in an environment where threads might call fork. The testcase
attached (test-threadfork.c) is a small example that creates a set of threads and each
thread allocates some memory, fork a allocates more memory.
Run the testcase with a higher number of threads and forks to trigger the issue.

What is the expected output? What do you see instead?
The expect output is to no deadlock occurs in the fork and all children process eventually
finish. The tcmalloc contains a bug that some internal locks are left in a undefined
state between fork, leaving the child process in a deadlock state.


What version of the product are you using? On what operating system?
I tested svn version r190 in a PPC64 and X86_64 Linux environment.


Please provide any additional information below.
The issue is the locks defined at src/static_vars.h, Static::pageheap_lock_ and each
lock from Static::CentralFreeListPadded elements, needs to be in a consistent state
in a forked version of a thread. Currently, some race issues might occurs if the following
scenario occurs:

Thread 1                                 |  Thread 2
calls malloc()                           |
\_ tcmalloc lock Static::pageheap_lock_  |
                                         |  calls fork()
                                         |  calls malloc()
                                         |  \_ tcmalloc tries to lock the same lock

The same might occur with any lock from Static::central_cache_ elements as well.

A possible solution, presented in patch gperftools-atfork.patch, is register 2 functions
with pthread_atfork to lock all the locks in the parent just prior the fork() call
and to unlock all the locks after the fork() call on both the parent and child. This
patch fixes the above behavior with the testcase.

I didn't on any other platform, so we might need to add guards on non-unix platforms.
I'm accepting suggestions.

最终,通过把new操作符的操作前置解决了该故障,修改后代码如下:

Process Process::launch(const char* cmdline, int argc, char * argv[])
{
	Process p;
	char ** args = new char*[argc + 2];
	if (!args)
	{
		return p;
	}
	args[0] = (char*)cmdline;
	args[argc + 1] = NULL;
	for (int i(0); i<argc; ++i) {
		args[i + 1] = argv[i];
	}
	int pid = fork();
	if (pid>0){
		p.pid = pid;
		delete[] args;
		return p;
	}
	else if (pid == 0){

		execvp(cmdline, args);
		delete[] args;
		exit(0);
	}
	delete[] args;
	return p;
}

google官方也给出了几个解决方案:

1预先把内存申请好,然后再执行fork方法,在子进程中直接执行execvp方法,可以解决该问题。

2使用pthread_atfork需要预先知道子进程要释放哪把锁,作为入参传入。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值