Linux多线程编程: 死锁场景、原因分析、解决方案

JiMoKuangXiangQu

已于 2023-01-03 12:33:59 修改

阅读量788

点赞数

分类专栏： # 进程调度文章标签： linux

于 2022-04-05 15:27:11 首次发布

本文链接：https://blog.csdn.net/JiMoKuangXiangQu/article/details/123969554

版权

进程调度专栏收录该内容

10 篇文章 0 订阅

订阅专栏

1. 前言

限于作者能力水平，本文可能存在谬误，因此而给读者带来的损失，作者不做任何承诺。

2. 本文目标

列举各种pthread编程的死锁场景，并简要分析原因，之后给出（不一定最优的）解决方案。

3. 死锁场景

3.1 案例1

3.1.1 场景

多个线程共享锁，然后再某个线程发***pthread_kill()/pthread_cancel()***调用，造成的死锁。代码如下：

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;

void *thread_func(void *arg)
{
	printf("thread started\n");
	
	for (;;) {
		pthread_mutex_lock(&mtx);
		printf("thread: lock\n");
		sleep(10);
		pthread_mutex_unlock(&mtx);
		printf("thread: unlock\n");
	}
}


int main(void)
{
	pthread_t pth;

	if (pthread_create(&pth, NULL, thread_func, NULL)) {
		perror("pthread_create");
		return -1;
	}

	sleep(3); /* 延时一段时间，保证 pth 比主线程先拿到锁 */

	printf("stop the thread... ");
	pthread_cancel(pth);
	pthread_join(pth, NULL);
	printf("done.\n");

	printf("main thread: try to get lock...\n");
	pthread_mutex_lock(&mtx);
	printf("main thread: do something with lock\n");
	pthread_mutex_unlock(&mtx);


	printf("main thread: exit\n");
	
	return 0;
}

# 运行结果
$ ./pthread_deadlock 
thread started
thread: lock
stop the thread... done.
main thread: try to get lock...

可以看到主线程将永远无法等到锁。

3.1.2 原因分析

如果程序没有设置可在任意点退出(可通过pthread_setcanceltype()设置)，pthread_kill()/pthread_cancel() 调用，给目标线程发停止信号，导致目标线程在cancellation point退出。什么是cancellation point? 简单来说，就是给线程发停止信号时，线程退出的位置。可通过 man 7 pthreads 查询，哪些库函数是是一个cancellation point。
https://man7.org/linux/man-pages/man7/pthreads.7.html
在文档中搜索关键字"Cancellation points"，从列表看到，很不幸，sleep()函数就是一个cancellation point，意味着线程可能在该处退出，此时会导致共享锁没有释放，主线程无法获得该锁，造成死锁。

3.1.3 解决方案

我们可以为所有的共享锁，维护一个每线程的锁嵌套层次计数：上锁时，嵌套计数加1，当计数由0变1时，调用pthread_setcancelstate()禁用线程的cancel；锁释放时，嵌套层次减1，当嵌套层次减到0时，我们调用pthread_testcancel()，该函数检测是否有挂起的***phread_cancel()/pthread_kill()***请求，有则会退出线程。
该方案有着很明显的缺点，当在锁内处理的事务耗时较长时，线程的退出将会延迟较长时间。

JiMoKuangXiangQu

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Linux多线程编程: 死锁场景、原因分析、解决方案

1. 前言限于作者能力水平，本文可能存在的谬误，因此而给读者带来的损失，作者不做任何承诺。2. 本文目标列举各种pthread编程的死锁场景，并简要分析原因，之后给出（不一定最优的）解决方案。3. 死锁场景3.1 案例13.1.1 场景多个线程共享锁，然后再某个线程发***pthread_kill()/pthread_cancel()***调用，造成的死锁。代码如下：#include <pthread.h>#include <stdio.h>#include &l
复制链接

扫一扫