最新遇到一个共享内存损坏导致常规锁释放报错warning "you don’t own a lock of type"的问题。
本篇对常规锁的概念做一些回顾,顺便分析下释放锁的流程。
- SpinLock:❎
- LWLock:❎
- RegularLock:✅
基础概念回顾
- LockMethodLockHash:共享内存中的全局哈希表,所有进程可见。
- 事务申请强锁、检测锁冲突或需要与其他事务协调时访问主锁表。
- FastPathStrongRelationLocks:强锁标记表,共享内存中的计数器数组,用于快速判断是否存在强锁。
- 快速筛选弱锁是否需进入主锁表流程,减少共享内存竞争。
- LockMethodLocalHash:本地锁表,进程本地内存,每个后台进程独立维护。
- 高频弱锁操作(如DML)的快速加锁与释放。
关于fastpath
- 先检查是否符合条件,EligibleForRelationFastPath,例如锁级别小于四,是表锁等。
- 去共享内存,FastPathStrongRelationLocks,查tag是否已经加过强锁了,加过就不能fastpath了。
- 如果都满足了,执行下面的,FastPathGrantRelationLock,开始加fast锁。加锁信息记录到PGPROC中。
FastPathGrantRelationLock函数:
- 使用PGPROC的数组Oid fpRelId[16];来保存OID。
- 使用PGPROC的变量uint64 fpLockBits当做位图来记录锁级别。
FastPathGrantRelationLock逻辑:
- 3个bit一组按顺序查位图是不是空的,是空的就记录下来位置,不是空的就看下oid里面记的是不是需要的,如果正好Oid也是需要的,把当前请求锁模式或进去就可以返回了。
- 如果查了一遍位图,所有Oid都不是需要的,那就找一个空的位置,把锁级别记录到位图,OID记录到数组,然后返回。
- 如果查了一遍位图,没有一个空余位置,就返回false了。
LockRelease详细流程分析
bool
LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
{
LOCKMETHODID lockmethodid = locktag->locktag_lockmethodid;
LockMethod lockMethodTable;
LOCALLOCKTAG localtag;
LOCALLOCK *locallock;
LOCK *lock;
PROCLOCK *proclock;
LWLock *partitionLock;
bool wakeupNeeded;
...
第一步 查本地锁表。
先查本地锁表,申请过的任何锁,一定会在本地锁表中存放,如果没有就会报错you don't own a lock of type
,这里一般不会报错,本地锁表LockMethodLocalHash不在共享内存中,损坏的概率比较低。
MemSet(&localtag, 0, sizeof(localtag)); /* must clear padding */
localtag.lock = *locktag;
localtag.mode = lockmode;
locallock = (LOCALLOCK *) hash_search(LockMethodLocalHash,
&localtag,
HASH_FIND, NULL);
if (!locallock || locallock->nLocks <= 0)
{
elog(WARNING, "you don't own a lock of type %s",
lockMethodTable->lockModeNames[lockmode]);
return false;
}
第二步 本地锁引用-1
- 锁在一个事务中可能被锁多次,但只有第一次锁才会走fastpath、主锁表,后面再加锁只会走本地锁表。
- 所以放锁时,如果发现是本地多次加锁的,只需要本地锁表-1即可。
- 这里还增加了多resowner的机制,因为可能有savepoint造成事务内 有多层子事务,这种情况下锁会被记录在多个resowner下,所以这里加了循环,遍历resowner,找到申请时使用的resowner,然后再ResourceOwnerForgetLock。
- 删完了一个顺便把数组最后一个挪到空洞位置,使数组始终是紧凑的。
{
LOCALLOCKOWNER *lockOwners = locallock->lockOwners;
ResourceOwner owner;
int i;
/* Identify owner for lock */
if (sessionLock)
owner = NULL;
else
owner = CurrentResourceOwner;
for (i = locallock->numLockOwners - 1; i >= 0; i--)
{
if (lockOwners[i].owner == owner)
{
Assert(lockOwners[i].nLocks > 0);
if (--lockOwners[i].nLocks == 0)
{
if (owner != NULL)
ResourceOwnerForgetLock(owner, locallock);
/* compact out unused slot */
locallock->numLockOwners--;
if (i < locallock->numLockOwners)
lockOwners[i] = lockOwners[locallock->numLockOwners];
}
break;
}
}
if (i < 0)
{
/* don't release a lock belonging to another owner */
elog(WARNING, "you don't own a lock of type %s",
lockMethodTable->lockModeNames[lockmode]);
return false;
}
}
locallock->nLocks--;
if (locallock->nLocks > 0)
return true;
第三步 最后一次释放锁,需要真正把锁释放了。
尝试fastpath release,因为fastpath都是所在本地的,所以如果能释放也不需要知会主锁表。
这里需要给MyProc->fpInfoLock加LWLock的原因是:其他进程加强锁时,会遍历所有进程的PGPROC中fastpath记录的弱锁,将冲突的迁移到主锁表中。所以这里有可能被其他进程并发更新,所以需要LWLock。
...
if (EligibleForRelationFastPath(locktag, lockmode) &&
FastPathLocalUseCount > 0)
{
bool released;
LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
released = FastPathUnGrantRelationLock(locktag->locktag_field2,
lockmode);
LWLockRelease(&MyProc->fpInfoLock);
if (released)
{
RemoveLocalLock(locallock);
return true;
}
}
第四步:主锁表操作。
fastpath没删成,说明锁在主锁表中。
【共享内存】主锁表:LockMethodLockHash(存储所有锁对象)
【共享内存】锁进程关系表:LockMethodProcLockHash(查询当前进程阻塞了哪些进程,死锁检测)
- 开始操作主锁表,为了增加并发,这里按hashcode做了分区。如果不在一个分区中的锁可以并发。
- 在主锁表中定位锁对象。
- 在LockMethodProcLockHash定位PROCLOCK,用于查询当前进程阻塞了哪些进程
partitionLock = LockHashPartitionLock(locallock->hashcode);
LWLockAcquire(partitionLock, LW_EXCLUSIVE);
lock = locallock->lock;
if (!lock)
{
PROCLOCKTAG proclocktag;
Assert(EligibleForRelationFastPath(locktag, lockmode));
lock = (LOCK *) hash_search_with_hash_value(LockMethodLockHash,
locktag,
locallock->hashcode,
HASH_FIND,
NULL);
if (!lock)
elog(ERROR, "failed to re-find shared lock object");
locallock->lock = lock;
proclocktag.myLock = lock;
proclocktag.myProc = MyProc;
locallock->proclock = (PROCLOCK *) hash_search(LockMethodProcLockHash,
&proclocktag,
HASH_FIND,
NULL);
if (!locallock->proclock)
elog(ERROR, "failed to re-find shared proclock object");
}
LOCK_PRINT("LockRelease: found", lock, lockmode);
proclock = locallock->proclock;
PROCLOCK_PRINT("LockRelease: found", proclock);
如果下面报错,说明在主锁表和LockMethodProcLockHash都查到了,但是proclock->holdMask记录的lockmode和当前要release的不对应,报一个告警,把本地锁释放了,但不释放主锁表的信息。
/*
* Double-check that we are actually holding a lock of the type we want to
* release.
*/
if (!(proclock->holdMask & LOCKBIT_ON(lockmode)))
{
PROCLOCK_PRINT("LockRelease: WRONGTYPE", proclock);
LWLockRelease(partitionLock);
elog(WARNING, "you don't own a lock of type %s",
lockMethodTable->lockModeNames[lockmode]);
RemoveLocalLock(locallock);
return false;
}
-
调用 UnGrantLock 清除锁的授予状态(grantMask)和等待队列(waitMask),并标记是否需要唤醒等待进程
-
通过 CleanUpLock 处理锁状态,若有等待进程,触发 LWLockWakeup 唤醒。
/*
* Do the releasing. CleanUpLock will waken any now-wakable waiters.
*/
wakeupNeeded = UnGrantLock(lock, lockmode, proclock, lockMethodTable);
CleanUpLock(lock, proclock,
lockMethodTable, locallock->hashcode,
wakeupNeeded);
LWLockRelease(partitionLock);
RemoveLocalLock(locallock);
return true;
}
LockRelease实例
这里截取drop table的一个中间态,给一个lock释放、并能走到主锁表释放的案例。
某一时刻查询到
postgres=# select * from pg_locks where relation=1214;
-[ RECORD 1 ]------+-----------------
locktype | relation
database | 0
relation | 1214
page |
tuple |
virtualxid |
transactionid |
classid |
objid |
objsubid |
virtualtransaction | 3/7
pid | 1131328
mode | RowExclusiveLock
granted | t
fastpath | f
waitstart |
释放三级锁,没有走fastpath。
LockRelease (locktag=0x7ffd581db190, lockmode=3, sessionLock=false)
为什么没走fastpath?因为locktag_field1=0,表示这个表不属于某一个库,一般是是共享系统表(1214指的是pg_shdepend确实是共享系统表)。
(gdb) p *locktag
$17 = {
locktag_field1 = 0,
locktag_field2 = 1214,
locktag_field3 = 0,
locktag_field4 = 0,
locktag_type = 0 '\000',
locktag_lockmethodid = 1 '\001'}
释放锁时,两个主要数据结构的值:
(gdb) p *lock
$19 = {
tag = {locktag_field1 = 0, locktag_field2 = 1214, locktag_field3 = 0, locktag_field4 = 0, locktag_type = 0 '\000', locktag_lockmethodid = 1 '\001'},
grantMask = 8,
waitMask = 0,
procLocks = {head = {prev = 0x7f487fd34340, next = 0x7f487fd34340}},
waitProcs = {dlist = {head = {prev = 0x7f487e715988, next = 0x7f487e715988}}, count = 0},
requested = {0, 0, 0, 1, 0, 0, 0, 0, 0, 0},
nRequested = 1,
granted = {0, 0, 0, 1, 0, 0, 0, 0, 0, 0},
nGranted = 1}
(gdb) p *proclock
$20 = {tag = {myLock = 0x7f487e715960, myProc = 0x7f4884dc4590},
groupLeader = 0x7f4884dc4590,
holdMask = 8,
releaseMask = 0,
lockLink = {prev = 0x7f487e715978, next = 0x7f487e715978},
procLink = {prev = 0x7f4884dc4668, next = 0x7f4884dc4668}}
Lock
- grantMask = 8:8 对应二进制 1000 对应 RowExclusiveLock。
- waitMask = 0:资源无冲突请求
- procLocks :指向自己,说明只有一个PROCLOCK关联当前这个锁。
- requested[3] = 1:当前锁对象上有一个 RowExclusiveLock 模式的请求。
- nRequested = 1:总请求次数为 1,与 requested[3] 一致。
- granted[3] = 1:该请求已被成功授予
PROCLOCK
- tag:关联的LOCK对象
- holdMask = 8:与LOCK的grantMask一致,表示该进程在锁对象上持有RowExclusiveLock
- releaseMask = 0:进程未触发锁释放操作
- lockLink:指向LOCK的procLocks链表,表示该PROCLOCK是链表中唯一的节点。
- procLink:指向PGPROC的本地锁链表,表示该锁属于进程的本地锁管理范围。
/*
* Per-locked-object lock information:
*
* tag -- uniquely identifies the object being locked
* grantMask -- bitmask for all lock types currently granted on this object.
* waitMask -- bitmask for all lock types currently awaited on this object.
* procLocks -- list of PROCLOCK objects for this lock.
* waitProcs -- queue of processes waiting for this lock.
* requested -- count of each lock type currently requested on the lock
* (includes requests already granted!!).
* nRequested -- total requested locks of all types.
* granted -- count of each lock type currently granted on the lock.
* nGranted -- total granted locks of all types.
*
* Note: these counts count 1 for each backend. Internally to a backend,
* there may be multiple grabs on a particular lock, but this is not reflected
* into shared memory.
*/
typedef struct LOCK
{
/* hash key */
LOCKTAG tag; /* unique identifier of lockable object */
/* data */
LOCKMASK grantMask; /* bitmask for lock types already granted */
LOCKMASK waitMask; /* bitmask for lock types awaited */
dlist_head procLocks; /* list of PROCLOCK objects assoc. with lock */
dclist_head waitProcs; /* list of PGPROC objects waiting on lock */
int requested[MAX_LOCKMODES]; /* counts of requested locks */
int nRequested; /* total of requested[] array */
int granted[MAX_LOCKMODES]; /* counts of granted locks */
int nGranted; /* total of granted[] array */
} LOCK;
/*
* We may have several different backends holding or awaiting locks
* on the same lockable object. We need to store some per-holder/waiter
* information for each such holder (or would-be holder). This is kept in
* a PROCLOCK struct.
*
* PROCLOCKTAG is the key information needed to look up a PROCLOCK item in the
* proclock hashtable. A PROCLOCKTAG value uniquely identifies the combination
* of a lockable object and a holder/waiter for that object. (We can use
* pointers here because the PROCLOCKTAG need only be unique for the lifespan
* of the PROCLOCK, and it will never outlive the lock or the proc.)
*
* Internally to a backend, it is possible for the same lock to be held
* for different purposes: the backend tracks transaction locks separately
* from session locks. However, this is not reflected in the shared-memory
* state: we only track which backend(s) hold the lock. This is OK since a
* backend can never block itself.
*
* The holdMask field shows the already-granted locks represented by this
* proclock. Note that there will be a proclock object, possibly with
* zero holdMask, for any lock that the process is currently waiting on.
* Otherwise, proclock objects whose holdMasks are zero are recycled
* as soon as convenient.
*
* releaseMask is workspace for LockReleaseAll(): it shows the locks due
* to be released during the current call. This must only be examined or
* set by the backend owning the PROCLOCK.
*
* Each PROCLOCK object is linked into lists for both the associated LOCK
* object and the owning PGPROC object. Note that the PROCLOCK is entered
* into these lists as soon as it is created, even if no lock has yet been
* granted. A PGPROC that is waiting for a lock to be granted will also be
* linked into the lock's waitProcs queue.
*/
typedef struct PROCLOCKTAG
{
/* NB: we assume this struct contains no padding! */
LOCK *myLock; /* link to per-lockable-object information */
PGPROC *myProc; /* link to PGPROC of owning backend */
} PROCLOCKTAG;
typedef struct PROCLOCK
{
/* tag */
PROCLOCKTAG tag; /* unique identifier of proclock object */
/* data */
PGPROC *groupLeader; /* proc's lock group leader, or proc itself */
LOCKMASK holdMask; /* bitmask for lock types currently held */
LOCKMASK releaseMask; /* bitmask for lock types to be released */
dlist_node lockLink; /* list link in LOCK's list of proclocks */
dlist_node procLink; /* list link in PGPROC's list of proclocks */
} PROCLOCK;