2021@SDUSC
目录
概述
上周分析了并发控制的脏读、不可重复读和幻读三种SQL在并行的事物之间避免的现象,这周将分析一下postgreSQL中的三种锁即SpinLock,LWLock,RegularLock。
SpinLock
SpinLock是最底层的锁,使用互斥信号量实现,与操作系统和硬件环境联系紧密。Spinlock 分
为与机器相关的实现方法(定义在s_ lock.c中)和与机器不相关的实现方法(定义在Spin.e中)。
SpinLock的特点是:封锁时间很短,没有等待队列和死锁检测机制,事务结束时不能自动释放
SpinLock。
如果机器拥有TAS (test-and-set) 指令集,那么PostgreSQL 会使用s_ lock. h和s_ lock. c中定义
的SpinLock实现机制。如果机器没有TAS指令集,那么不依赖于硬件的SpinLock的实现定义在
Spin. c中,需要用到PostgreSQL定义的信号量PGSemaphore。
static void
s_lock_stuck(const char *file, int line, const char *func)
{
if (!func)
func = "(unknown)";
#if defined(S_LOCK_TEST)
fprintf(stderr,
"\nStuck spinlock detected at %s, %s:%d.\n",
func, file, line);
exit(1);
#else
elog(PANIC, "stuck spinlock detected at %s, %s:%d",
func, file, line);
#endif
}
/*
* s_lock(lock) - platform-independent portion of waiting for a spinlock.
*/
int
s_lock(volatile slock_t *lock, const char *file, int line, const char *func)
{
SpinDelayStatus delayStatus;
init_spin_delay(&delayStatus, file, line, func);
while (TAS_SPIN(lock))
{
perform_spin_delay(&delayStatus);
}
finish_spin_delay(&delayStatus);
return delayStatus.delays;
}
#ifdef USE_DEFAULT_S_UNLOCK
void
s_unlock(volatile slock_t *lock)
{
#ifdef TAS_ACTIVE_WORD
/* HP's PA-RISC */
*TAS_ACTIVE_WORD(lock) = -1;
#else
*lock = 0;
#endif
}
#endif
LWLock
LWLock (轻量级锁)主要提供对共享存储器的数据结构的互斥访问。LWLock 有两种锁模式,一种为排他模式,另一种为共享模式。轻量级锁不提供死锁检测,但轻量级锁管理器在elog恢复期间被自动释放,所以持有轻量级锁的期间调用elog发出错误消息不会出现轻量级锁未释放的问题。
LWLock利用SpinLock实现,当没有锁的竞争时可以很快获得或释放LWLock。当一个进程阻塞在一个轻量级锁.时,相当于它阻塞在一个信号量上,所以不会消耗CPU时间,等待的进程将会以先来后到的顺序被授予锁。
简单来说,LWLock的特点是:有等待队列、无死锁检测、能自动释放锁。
LWLock的数据结构
typedef struct LwLock
{
slock_ t mutex ;//保护LwLock和进程队列
bool releaseOK;//0或1.持有排他LWLock锁的后蝙数目
char exclusive;
int shared ;//持有共享LWLock锁的后端数目(0. .MaxBackends )
PROC *head;//进程等待队列的头
PROC *tail;//进程等待队列的尾,当头为空时,尾不确定
} LwLock;
可以看到,LWLock 的数据结构中定义了一个互斥量mutex。使用SpinLock对此互斥量加/解锁
即可实现对一个LWLock的互斥访问。系统用一个全局数组LWLockArray管理所有的LWLock。
LWLock的主要操作
(1) LWLock的空间分配
Postmaster启动后需要在共享内存空间中为LWLock分配空间,该操作定义在函数LWLockSh-
memSize中。分配空间时,需要计算出将要分配的LWLock的个数,计算过程由函数NumLWLocks
给出,其计算公式如下所示:
/*
* Compute number of LWLocks required by named tranches. These will be
* allocated in the main array.
*/
static int
NumLWLocksByNamedTranches(void)
{
int numLocks = 0;
int i;
for (i = 0; i < NamedLWLockTrancheRequests; i++)
numLocks += NamedLWLockTrancheRequestArray[i].num_lwlocks;
return numLocks;
}
/*
* Compute shmem space needed for LWLocks and named tranches.
*/
Size
LWLockShmemSize(void)
{
Size size;
int i;
int numLocks = NUM_FIXED_LWLOCKS;
numLocks += NumLWLocksByNamedTranches();
/* Space for the LWLock array. */
size = mul_size(numLocks, sizeof(LWLockPadded));
/* Space for dynamic allocation counter, plus room for alignment. */
size = add_size(size, sizeof(int) + LWLOCK_PADDED_SIZE);
/* space for named tranches. */
size = add_size(size, mul_size(NamedLWLockTrancheRequests, sizeof(NamedLWLockTranche)));
/* space for name of each tranche. */
for (i = 0; i < NamedLWLockTrancheRequests; i++)
size = add_size(size, strlen(NamedLWLockTrancheRequestArray[i].tranche_name) + 1);
/* Disallow named LWLocks' requests after startup */
lock_named_request_allowed = false;
return size;
}
numLocks = ( int) NumFixedLWLocks + 2 * NBufers + NUM_ _CLOG BUFFERS
+ NUM_ SUBTRANS_ BUFFERS + NUM_ _MXACTOFFSET_ _BUFFERS
+ NUM_ MXACTMEMBER_ BUFFERS
+ Max( lock_ addin_ request ,NUM_ USER DEFINED_ LWLOCKS) )
变量 | 含义 |
NumFixedLWLocks | 预先定义的LWLock数量 |
NBufers | 缓冲区个数,每一个缓冲区需要一个LWLock |
NUM_ _CLOG BUFFERS | CLOG缓冲区个数,每一个CLOG缓冲区需要一个LWLock |
NUM_ SUBTRANS_ BUFFERS | SubTrans缓冲区个数,每个SubTrans需要一个LWLock |
NUM_ _MXACTOFFSET_ _BUFFERS | MULTIXACT中的每个SLRU 0ffset Buffer需要-一个 LWLock ( 其中Offset Buffer存储的是MULTIXACTID, 系统设置用8个Buffer。 在MVCC 中,一个MULTIXACT标识当前元组涉及的多个处理过它的事务,这些事务ID共同组成了MULTIXACT,系统使用MULTIXACTID来标识该MULTIXACT,其中每个事务ID就是该MULTIXACT的成员) |
lock_ addin_ request | MULTTXACT中的每个SLRU Member Buffer需要-一个LWLock ( 其中MemberBuffer存储的是MULTIXACT的每个成员,在MVCC 中,由于一个元组中最多有两个事务ID存在(Xmin 和Xmax),所以系统该SLRU成员个数是SLRU Offset Buffer的两倍,即16个Buffer) |
NUM_ USER DEFINED_ LWLOCKS | 留下一些额外的缓冲区供用户自定义使用 |
(2) LWLock 的创建
LWLock的创建过程包括分配空间和初始化两个过程。其中分配空间的过程在LWLockShmem-
Size函数中给出,初始化即对LWLock结构的初始化,使它们处于“未上锁”的状态。
Size
LWLockShmemSize(void)
{
Size size;
int i;
int numLocks = NUM_FIXED_LWLOCKS;
numLocks += NumLWLocksByNamedTranches();
/* Space for the LWLock array. */
size = mul_size(numLocks, sizeof(LWLockPadded));
/* Space for dynamic allocation counter, plus room for alignment. */
size = add_size(size, sizeof(int) + LWLOCK_PADDED_SIZE);
/* space for named tranches. */
size = add_size(size, mul_size(NamedLWLockTrancheRequests, sizeof(NamedLWLockTranche)));
/* space for name of each tranche. */
for (i = 0; i < NamedLWLockTrancheRequests; i++)
size = add_size(size, strlen(NamedLWLockTrancheRequestArray[i].tranche_name) + 1);
/* Disallow named LWLocks' requests after startup */
lock_named_request_allowed = false;
return size;
}
(3) LWLock 的分配
LWLock分配操作定义在函数LWLockAssign中,该函数的主要功能就是从共享内存中预定义的
闲置LWLock Array (表中的NUM_ USER DEFINED_ LWLOCKS)中得到一个LWLock,同时LWlockArray前面的计数器会累加。基本过程是申请SpinLock ( 访问LWLock),使用计数器得到闲
置的LWLock数目,若没有闲置的LWLock,输出错误;否则,修改计数器,释放SpinLock,返回指
向闲置LWLock的一个指针。
(4) LWLock 锁的获取
LWLock的获取由函数LWLockAcquire定义,该函数试图以给定的轻量级锁ID (LWLock ID)
和锁模式( SHARE/EXCLUSIVE)获得对应的轻量级锁。如果暂时不能获得,就进人睡眠状态直至
该锁空闲。该函数执行流程如下:
1)开中断,获取SpinLock。
2)检查锁的当前情况,如果空闲则获得锁,然后释放SpinLock退出。
3)否则把自己加人等待队列中,释放SpinLock,并- -直等到被唤醒。
4)重复上述步骤。
LWLock还提供了另外一种获取LWLock的方式:使用LWLockConditionalAcquire函数,它与
上述函数的区别是若能获得此锁,则返回TRUE,否者返回FALSE。
bool
LWLockAcquire(LWLock *lock, LWLockMode mode)
{
PGPROC *proc = MyProc;
bool result = true;
int extraWaits = 0;
#ifdef LWLOCK_STATS
lwlock_stats *lwstats;
lwstats = get_lwlock_stats_entry(lock);
#endif
AssertArg(mode == LW_SHARED || mode == LW_EXCLUSIVE);
PRINT_LWDEBUG("LWLockAcquire", lock, mode);
#ifdef LWLOCK_STATS
/* Count lock acquisition attempts */
if (mode == LW_EXCLUSIVE)
lwstats->ex_acquire_count++;
else
lwstats->sh_acquire_count++;
#endif /* LWLOCK_STATS */
Assert(!(proc == NULL && IsUnderPostmaster));
if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
elog(ERROR, "too many LWLocks taken");
HOLD_INTERRUPTS();
for (;;)
{
bool mustwait;
/*
* Try to grab the lock the first time, we're not in the waitqueue
* yet/anymore.
*/
mustwait = LWLockAttemptLock(lock, mode);
if (!mustwait)
{
LOG_LWDEBUG("LWLockAcquire", lock, "immediately acquired lock");
break; /* got the lock */
}
/* add to the queue */
LWLockQueueSelf(lock, mode);
/* we're now guaranteed to be woken up if necessary */
mustwait = LWLockAttemptLock(lock, mode);
/* ok, grabbed the lock the second time round, need to undo queueing */
if (!mustwait)
{
LOG_LWDEBUG("LWLockAcquire", lock, "acquired, undoing queue");
LWLockDequeueSelf(lock);
break;
}
LOG_LWDEBUG("LWLockAcquire", lock, "waiting");
#ifdef LWLOCK_STATS
lwstats->block_count++;
#endif
LWLockReportWaitStart(lock);
TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
for (;;)
{
PGSemaphoreLock(proc->sem);
if (!proc->lwWaiting)
break;
extraWaits++;
}
/* Retrying, allow LWLockRelease to release waiters again. */
pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_RELEASE_OK);
#ifdef LOCK_DEBUG
{
/* not waiting anymore */
uint32 nwaiters PG_USED_FOR_ASSERTS_ONLY = pg_atomic_fetch_sub_u32(&lock->nwaiters, 1);
Assert(nwaiters < MAX_BACKENDS);
}
#endif
TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
LWLockReportWaitEnd();
LOG_LWDEBUG("LWLockAcquire", lock, "awakened");
/* Now loop back and try to acquire lock again. */
result = false;
}
TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
while (extraWaits-- > 0)
PGSemaphoreUnlock(proc->sem);
return result;
}
(5) LWLock 锁的释放
LWLock的释放由函数LWLockRelease完成,主要功能是释放指定LockID的锁。该函数的执行
流程如下:
1)获得该锁的SpinLock。
2)检查此次解锁是否唤醒其他等待进程。
3)如果需要唤醒进程,遍历等待队列;如果遇到要求读锁的进程,从队列中删除,但保留一
个指向它的指针。重复操作直到遇到要求写锁的进程。
4)释放该LWLock的SpinLock,把从队列中删除的进程唤醒。
另外,PostgreSQL还提供能释放当前后端持有的所有LWLock 锁的功能,该功能在函数
LWLockReleaseAll中定义,主要在系统出现错误之后使用。其实现比较简单,就是重复地调用.
LWLockRealease函数进行LWLock的释放。
/*
* LWLockRelease - release a previously acquired lock
*/
void
LWLockRelease(LWLock *lock)
{
LWLockMode mode;
uint32 oldstate;
bool check_waiters;
int i;
/*
* Remove lock from list of locks held. Usually, but not always, it will
* be the latest-acquired lock; so search array backwards.
*/
for (i = num_held_lwlocks; --i >= 0;)
if (lock == held_lwlocks[i].lock)
break;
if (i < 0)
elog(ERROR, "lock %s is not held", T_NAME(lock));
mode = held_lwlocks[i].mode;
num_held_lwlocks--;
for (; i < num_held_lwlocks; i++)
held_lwlocks[i] = held_lwlocks[i + 1];
PRINT_LWDEBUG("LWLockRelease", lock, mode);
/*
* Release my hold on lock, after that it can immediately be acquired by
* others, even if we still have to wakeup other waiters.
*/
if (mode == LW_EXCLUSIVE)
oldstate = pg_atomic_sub_fetch_u32(&lock->state, LW_VAL_EXCLUSIVE);
else
oldstate = pg_atomic_sub_fetch_u32(&lock->state, LW_VAL_SHARED);
/* nobody else can have that kind of lock */
Assert(!(oldstate & LW_VAL_EXCLUSIVE));
/*
* We're still waiting for backends to get scheduled, don't wake them up
* again.
*/
if ((oldstate & (LW_FLAG_HAS_WAITERS | LW_FLAG_RELEASE_OK)) ==
(LW_FLAG_HAS_WAITERS | LW_FLAG_RELEASE_OK) &&
(oldstate & LW_LOCK_MASK) == 0)
check_waiters = true;
else
check_waiters = false;
/*
* As waking up waiters requires the spinlock to be acquired, only do so
* if necessary.
*/
if (check_waiters)
{
/* XXX: remove before commit? */
LOG_LWDEBUG("LWLockRelease", lock, "releasing waiters");
LWLockWakeup(lock);
}
TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(lock));
/*
* Now okay to allow cancel/die interrupts.
*/
RESUME_INTERRUPTS();
}
总结
还差RegularLock就把并发控制临界区的互斥访问的三种锁的定义分析结束了。欢迎批评指正。