pthread_cond_destroy死锁卡住问题处理记录

问题

供应商代码, 在退出某线程时, 销毁条件变量的过程中, 线程被阻塞.

在这里插入图片描述

参考手册

参看man手册, 销毁其它线程正在等待的cond将导致不确定行为:

pthread_cond_destroy()
It  shall be safe to destroy an initialized condition variable upon which no threads are currently blocked. Attempting to destroy a condition variable upon which other threads are currently blocked results in undefined behavior.

因此在销毁之前, 先发送pthread_cond_broadcast(&pEvent->cond);
通知所有等待线程:

int32_t osEventDestroy(osEvent *pEvent)
{
    pthread_mutex_lock(&pEvent->mutex);
    pthread_cond_broadcast(&pEvent->cond);
    pthread_mutex_unlock(&pEvent->mutex);

    pthread_cond_destroy(&pEvent->cond);
    pthread_mutex_destroy(&pEvent->mutex); 

再次测试, 问题还是有概率出现.

查看源码

查看源码描述, __pthread_cond_destroy 默认有其它线程在等待, 因此将会等待__wrefs变量的值:


/* See __pthread_cond_wait for a high-level description of the algorithm.

   A correct program must make sure that no waiters are blocked on the condvar
   when it is destroyed, and that there are no concurrent signals or
   broadcasts.  To wake waiters reliably, the program must signal or
   broadcast while holding the mutex or after having held the mutex.  It must
   also ensure that no signal or broadcast are still pending to unblock
   waiters; IOW, because waiters can wake up spuriously, the program must
   effectively ensure that destruction happens after the execution of those
   signal or broadcast calls.
   Thus, we can assume that all waiters that are still accessing the condvar
   have been woken.  We wait until they have confirmed to have woken up by
   decrementing __wrefs.  */
int
__pthread_cond_destroy (pthread_cond_t *cond)
{
  LIBC_PROBE (cond_destroy, 1, cond);

  /* Set the wake request flag.  We could also spin, but destruction that is
     concurrent with still-active waiters is probably neither common nor
     performance critical.  Acquire MO to synchronize with waiters confirming
     that they finished.  */
  unsigned int wrefs = atomic_fetch_or_acquire (&cond->__data.__wrefs, 4);
  int private = __condvar_get_private (wrefs);
  while (wrefs >> 3 != 0)
    {
      futex_wait_simple (&cond->__data.__wrefs, wrefs, private);
      /* See above.  */
      wrefs = atomic_load_acquire (&cond->__data.__wrefs);
    }
  /* The memory the condvar occupies can now be reused.  */
  return 0;
}

打印销毁之前__wrefs的为-8, 不可理喻. 尝试将其强制清零之后, 问题消失.

    //
int32_t osEventDestroy(osEvent *pEvent)
{
    pthread_mutex_lock(&pEvent->mutex);
    pthread_cond_broadcast(&pEvent->cond);
    pthread_mutex_unlock(&pEvent->mutex);
	if(0 != pEvent->cond.__data.__wrefs)
	{
		OSLAYER_ERR("%p %s cond error with refs %d\n",pEvent,__func__,pEvent->cond.__data.__wrefs);
		pEvent->cond.__data.__wrefs = 0;
	}
    pthread_cond_destroy(&pEvent->cond);
    pthread_mutex_destroy(&pEvent->mutex); 

追查原因

查阅代码, 没有其它更多的线程在使用该变量, 那么为啥该值会异常呢?
最后发现, 是因为源码在使用条件变量时, 先启动了等待线程pthread_cond_wait, 再进行了cond的初始化.
也就是说,pthread_cond_wait带入了条件变量的时候, 该条件变量并没有初始化, 执行完成了pthread_cond_wait之后, 才调用了pthread_cond_init初始化变量.

调整代码逻辑之后, 问题消失.

结论:

使用未初始化的条件变量, 函数不会报错,但可能产生执行异常.

#include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <windows.h> typedef struct QueueNode { int id; struct QueueNode* next; }QueueNode; typedef struct TaskQueue { QueueNode* front; QueueNode* rear; }TaskQueue; int InitQueue(TaskQueue* Qp) { Qp->rear = Qp->front = (QueueNode*)malloc(sizeof(QueueNode)); Qp->front->id = 2018; Qp->front->next = NULL; return 1; } int EnQueue(TaskQueue* Qp, int e) { QueueNode* newnode = (QueueNode*)malloc(sizeof(QueueNode)); if (newnode == NULL) return 0; newnode->id = e; newnode->next = NULL; Qp->rear->next = newnode; Qp->rear = newnode; return 1; } int DeQueue(TaskQueue* Qp, int* ep, int threadID) { QueueNode* deletenode; if (Qp->rear == Qp->front) return 0; deletenode = Qp->front->next; if (deletenode == NULL) { return 0; } *ep = deletenode->id; Qp->front->next = deletenode->next; free(deletenode); return 1; } int GetNextTask(); int thread_count, finished = 0; pthread_mutex_t mutex, mutex2; pthread_cond_t cond; void* task(void* rank); TaskQueue Q; int main() { int n; InitQueue(&Q); pthread_t* thread_handles; thread_count = 8; thread_handles = malloc(thread_count * sizeof(pthread_t)); pthread_mutex_init(&mutex, NULL); pthread_mutex_init(&mutex2, NULL); pthread_cond_init(&cond, NULL); printf("Task Number:"); scanf_s("%d", &n); for (int i = 0; i < thread_count; i++) pthread_create(&thread_handles[i], NULL, task, (void*)i); for (int i = 0; i < n; i++) { pthread_mutex_lock(&mutex2); EnQueue(&Q, i); Sleep(1); pthread_cond_signal(&cond); pthread_mutex_unlock(&mutex2); } finished = 1; pthread_cond_broadcast(&cond); for (int i = 0; i < thread_count; i++) pthread_join(thread_handles[i], NULL); pthread_mutex_destroy(&mutex); pthread_cond_destroy(&cond); free(thread_handles); return 0; } void* task(void* rank) { int my_rank = (long)rank; int my_task; QueueNode** p = &(Q.front->next); while (1) { pthread_mutex_lock(&mutex2); if (finished) { if (*p == NULL) { pthread_mutex_unlock(&mutex2); break; } DeQueue(&Q, &my_task, my_rank); pthread_mutex_unlock(&mutex2); printf("From thread %ld: Task no.%-3d result->%5d\n", my_rank, my_task, my_task * 10); } else { while(pthread_cond_wait(&cond, &mutex2)!=0); //pthread_mutex_lock(&mutex2); DeQueue(&Q, &my_task, my_rank); pthread_mutex_unlock(&mutex2); Sleep(2); printf("From thread %ld: Task no.%-3d result->%5d\n", my_rank, my_task, my_task * 10); } } } 该代码在运行中可能遇到什么问题
07-14
在这段代码中,可能会遇到以下问题: 1. 并发访问问题:由于多个线程同时访问任务队列,可能会导致竞争条件和数据不一致的问题。例如,当多个线程同时执行EnQueue或DeQueue操作时,可能会导致内存泄漏或无法正确删除节点。 2. 线程同步问题:在条件变量的使用中,需要确保正确的等待和唤醒顺序。在task函数中,使用了条件变量和互斥锁来同步线程的执行,但是可能存在死锁或唤醒丢失的情况。 3. 内存泄漏问题:在EnQueue函数中,每次插入新节点时都会分配内存,但没有释放已插入节点的内存。这可能导致内存泄漏。 4. 跨平台问题:代码中使用了Windows特定的Sleep函数,因此在Linux环境下可能无法直接编译和运行。可能需要替换为适用于Linux环境的等效函数。 为了解决这些问题,建议进行以下改进: 1. 使用互斥锁来保护共享资源的访问,例如对任务队列的操作应该在加锁的情况下进行。 2. 确保条件变量的正确使用,包括等待和唤醒的顺序以及互斥锁的正确解锁和加锁。 3. 在EnQueue函数中,释放已插入节点的内存,避免内存泄漏。 4. 针对跨平台问题,可以使用与平台无关的替代函数,例如使用pthread库提供的等效函数来替代Windows特定的Sleep函数。 5. 进行错误处理和边界情况的检查,例如在DeQueue操作中,需要检查队列是否为空,以避免空指针错误。 通过以上改进,可以提高代码的健壮性和可移植性,减少潜在的错误和异常情况。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值