最近闲余时间造的轮子,Github完整源码见如下链接:
https://github.com/CallonHuang/DeadLockCheck
死锁一直是众多开发者碰到的难题,排查方式最多的就是gdb/strace+gdb(之前也写过相应博客),那么是否有一种使用纯应用的方式检测死锁的方法呢?
本文即介绍一种基于资源表的死锁检测实现方法:
- 使用宏定义或动态库优先加载将mutex相关函数捕获
- lock前,将锁地址/pid/函数调用地址等信息存储到请求的Hash表
- lock后,将锁地址/pid/函数调用地址等信息存储到占有的Hash表,并删除请求Hash表中的信息
- unlock后,删除占有Hash表中的信息
这样就得到了锁和解锁前后的所有信息,当出现死锁时,依次遍历请求Hash表中的节点,并从占有Hash表中查找,从而回溯出是否存在死锁环。
核心算法的函数如下:
/*
part of algorithm to find lock circle
brief: traverse request list and owner list, to find the relation path
param:
[in]requestIndex : [SelectRequest] returned the index of request list which has some node not be visited
[in&out]visitArray : for [SelectRequest] run fast, visitArray can sign the number of visited node(of the index of request list)
[in]visitKey : to sign every [RequestTrace], if [VisitRequest] returned 'VISITED_BEFORE', means you returned the node that you visit in before [RequestTrace]
return:
'NOT_FOUND' : the owner is not requesting now / no more unvisited node of the index of request list
'VISITED_BEFORE' : in the different [RequestTrace], you returned the node that you visit before
'REVISITED' : in the same [RequestTrace], you returned the node that you visit before
*/
static int RequestTrace(int requestIndex, int *visitArray, int visitKey)
{
DEAD_LOCK_INFO *targetNode = NULL, *currNode = NULL, traceStackList;
int ownerIndex = 0, ret = OK;
pid_t pid = UNSPECIFIED_PID;
ListInit(&traceStackList.list);
do {
if (OK != (ret = VisitRequest(requestIndex, pid, visitArray, &targetNode, visitKey))) {
if (UNSPECIFIED_PID == pid) {
/*to prevent you use the same requestIndex again and again*/
visitArray[requestIndex] = requestTable[requestIndex].list.count;
break;
}
if (NULL != currNode) {
InsertStack(&traceStackList, currNode->pid, currNode->lockAddr, currNode->retFuncAddr);
}
if (NOT_FOUND == ret || VISITED_BEFORE == ret) {
printf("request wait owner as below:\n");
} else {
printf("find lock circle as below:\n");
}
LIST_FOR_EACH(DEAD_LOCK_INFO, currNode, traceStackList.list) {
if (currNode != (DEAD_LOCK_INFO *)traceStackList.list.node.next) {
printf("->");
}
printf("[pid: %d func: %p]", currNode->pid, currNode->retFuncAddr);
}
printf("\n");
break;
}
InsertStack(&traceStackList, targetNode->pid, targetNode->lockAddr, targetNode->retFuncAddr);
ownerIndex = Hash32((unsigned long)targetNode->lockAddr, MAX_HASH_BITS);
LIST_FOR_EACH(DEAD_LOCK_INFO, currNode, ownerTable[ownerIndex].list) {
if (currNode->lockAddr == targetNode->lockAddr) {
break;
}
}
if (currNode != (DEAD_LOCK_INFO *)&ownerTable[ownerIndex].list.node) {
pid = currNode->pid;
requestIndex = Hash32((unsigned long)currNode->pid, MAX_HASH_BITS);
} else {
/*request is valid, no owner exist*/
ret = NOT_FOUND;
break;
}
} while (1);
ListDestroy(&traceStackList.list);
return ret;
}