问题
供应商代码, 在退出某线程时, 销毁条件变量的过程中, 线程被阻塞.
参考手册
参看man手册, 销毁其它线程正在等待的cond将导致不确定行为:
pthread_cond_destroy()
It shall be safe to destroy an initialized condition variable upon which no threads are currently blocked. Attempting to destroy a condition variable upon which other threads are currently blocked results in undefined behavior.
因此在销毁之前, 先发送pthread_cond_broadcast(&pEvent->cond);
通知所有等待线程:
int32_t osEventDestroy(osEvent *pEvent)
{
pthread_mutex_lock(&pEvent->mutex);
pthread_cond_broadcast(&pEvent->cond);
pthread_mutex_unlock(&pEvent->mutex);
pthread_cond_destroy(&pEvent->cond);
pthread_mutex_destroy(&pEvent->mutex);
再次测试, 问题还是有概率出现.
查看源码
查看源码描述, __pthread_cond_destroy 默认有其它线程在等待, 因此将会等待__wrefs变量的值:
/* See __pthread_cond_wait for a high-level description of the algorithm.
A correct program must make sure that no waiters are blocked on the condvar
when it is destroyed, and that there are no concurrent signals or
broadcasts. To wake waiters reliably, the program must signal or
broadcast while holding the mutex or after having held the mutex. It must
also ensure that no signal or broadcast are still pending to unblock
waiters; IOW, because waiters can wake up spuriously, the program must
effectively ensure that destruction happens after the execution of those
signal or broadcast calls.
Thus, we can assume that all waiters that are still accessing the condvar
have been woken. We wait until they have confirmed to have woken up by
decrementing __wrefs. */
int
__pthread_cond_destroy (pthread_cond_t *cond)
{
LIBC_PROBE (cond_destroy, 1, cond);
/* Set the wake request flag. We could also spin, but destruction that is
concurrent with still-active waiters is probably neither common nor
performance critical. Acquire MO to synchronize with waiters confirming
that they finished. */
unsigned int wrefs = atomic_fetch_or_acquire (&cond->__data.__wrefs, 4);
int private = __condvar_get_private (wrefs);
while (wrefs >> 3 != 0)
{
futex_wait_simple (&cond->__data.__wrefs, wrefs, private);
/* See above. */
wrefs = atomic_load_acquire (&cond->__data.__wrefs);
}
/* The memory the condvar occupies can now be reused. */
return 0;
}
打印销毁之前__wrefs的为-8, 不可理喻. 尝试将其强制清零之后, 问题消失.
//
int32_t osEventDestroy(osEvent *pEvent)
{
pthread_mutex_lock(&pEvent->mutex);
pthread_cond_broadcast(&pEvent->cond);
pthread_mutex_unlock(&pEvent->mutex);
if(0 != pEvent->cond.__data.__wrefs)
{
OSLAYER_ERR("%p %s cond error with refs %d\n",pEvent,__func__,pEvent->cond.__data.__wrefs);
pEvent->cond.__data.__wrefs = 0;
}
pthread_cond_destroy(&pEvent->cond);
pthread_mutex_destroy(&pEvent->mutex);
追查原因
查阅代码, 没有其它更多的线程在使用该变量, 那么为啥该值会异常呢?
最后发现, 是因为源码在使用条件变量时, 先启动了等待线程pthread_cond_wait, 再进行了cond的初始化.
也就是说,pthread_cond_wait带入了条件变量的时候, 该条件变量并没有初始化, 执行完成了pthread_cond_wait之后, 才调用了pthread_cond_init初始化变量.
调整代码逻辑之后, 问题消失.
结论:
使用未初始化的条件变量, 函数不会报错,但可能产生执行异常.