PostgreSQL中的锁--spinLock、LWLock、Lock

一、PostgreSQL中的锁

PostgreSQL中根据不同对象,不同使用场景,使用到了三种锁,即spinLock,LWLock,Lock

1.spinLock

SpinLock也就是所谓的自旋锁,是并发场景下(多进程/线程),保护共享资源的一种机制。实现的成本最低,一般是使用基于硬件的TAS操作(test-and-set来实现的)。显著的特点是审请锁的进程一直在尝试能否加锁成功,只有等到持有锁的线程释放锁之后才可以获取锁。在等待锁的过程中进程并不是切入内核态进行sleep,而是忙等待,即忙循环–旋转–等待锁重新可用,因此一直在user态使用cpu。该锁只有独占一种模式

因此使用场景为:占锁时间短,对临界资源进行简单访问,并且临界区较短。

PostgreSQL中spinLock的使用

1.CPU指令集TAS方式:


/*
 * s_lock(lock) - platform-independent portion of waiting for a spinlock.
 */
int
s_lock(volatile slock_t *lock, const char *file, int line, const char *func)
{
	SpinDelayStatus delayStatus;

        // 初始化SpinLock的状态信息
	init_spin_delay(&delayStatus, file, line, func);

	while (TAS_SPIN(lock))  //这里调用TAS
	{
                // spins,在cpu级别有一个delay时间,另外当spin次数大于100,
                // 在此函数中会随机休眠1ms到1s
		perform_spin_delay(&delayStatus);
	}

        // 获取锁后,根据delay的结果调整进入休眠的spin次数,如果,在获取锁的时候
        // 没有休眠过,那么可以把进入休眠spin的次数调大。如果休眠过,表示锁竞争大,
        // 就把进入休眠spin的次数降低,减少CPU消耗。
	finish_spin_delay(&delayStatus);

	return delayStatus.delays;
TAS函数实现
#ifdef __x86_64__		/* AMD Opteron, Intel EM64T */
#define HAS_TEST_AND_SET

typedef unsigned char slock_t;

#define TAS(lock) tas(lock)

/*
 * On Intel EM64T, it's a win to use a non-locking test before the xchg proper,
 * but only when spinning.
 *
 * See also Implementing Scalable Atomic Locks for Multi-Core Intel(tm) EM64T
 * and IA32, by Michael Chynoweth and Mary R. Lee. As of this writing, it is
 * available at:
 * http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-for-multi-core-intel-em64t-and-ia32-architectures
 */
#define TAS_SPIN(lock)    (*(lock) ? 1 : TAS(lock))

static __inline__ int
tas(volatile slock_t *lock)
{
	register slock_t _res = 1;

	__asm__ __volatile__(
		"	lock			\n"
		"	xchgb	%0,%1	\n"
:		"+q"(_res), "+m"(*lock)
:		/* no inputs */
:		"memory", "cc");
	return (int) _res;
}

2.使用semaphore实现

如果DB运行的平台没有test-and-set指令,则使用PGsemaphore实现SpinLock。PG中默认有128个信号量用于SpinLock(系统默认最大同时可用的semaphore为128,cat /proc/sys/kernel/sem 查看),PG信号量实现的加锁逻辑如下:

int
tas_sema(volatile slock_t *lock)
{
	int			lockndx = *lock;

	if (lockndx <= 0 || lockndx > NUM_SPINLOCK_SEMAPHORES)
		elog(ERROR, "invalid spinlock number: %d", lockndx);
	/* Note that TAS macros return 0 if *success* */
	return !PGSemaphoreTryLock(SpinlockSemaArray[lockndx - 1]);
}
/*
 * PGSemaphoreTryLock
 *
 * Lock a semaphore only if able to do so without blocking
 */
bool
PGSemaphoreTryLock(PGSemaphore sema)
{
	int			errStatus;

	/*
	 * Note: if errStatus is -1 and errno == EINTR then it means we returned
	 * from the operation prematurely because we were sent a signal.  So we
	 * try and lock the semaphore again.
	 */
	do
	{
		errStatus = sem_trywait(PG_SEM_REF(sema));
	} while (errStatus < 0 && errno == EINTR);

	if (errStatus < 0)
	{
		if (errno == EAGAIN || errno == EDEADLK)
			return false;		/* failed to lock it */
		/* Otherwise we got trouble */
		elog(FATAL, "sem_trywait failed: %m");
	}

	return true;
}

由于SpinLock不能用于需要长久持有锁的逻辑,在PostgreSQL中,SpinLock主要用于对于临界变量的并发访问控制,所保护的临界区通常是简单的赋值语句,读取语句等等。

2.LWlock

LWlock:Lightweight Lock,即所谓的轻量级锁,这个轻量是相对第三种Lock而言的。基于spinLock实现,除了独占模式(互斥),还多了一种共享模式和一种special mode。

typedef enum LWLockMode
{
	LW_EXCLUSIVE,
	LW_SHARED,
	LW_WAIT_UNTIL_FREE			/* A special mode used in PGPROC->lwlockMode,
								 * when waiting for lock to become free. Not
								 * to be used as LWLockAcquire argument */
} LWLockMode;

其主要是以互斥访问的方式用来保护共享内存数据结构,比如Clog buffer(事务提交状态缓存)、Shared buffers(数据页缓存)、wal buffer(wal缓存)等等。

LWlock数据结构定义:

typedef struct LWLock
{
	uint16		tranche;		/* tranche ID */
	pg_atomic_uint32 state;		/* state of exclusive/nonexclusive lockers */
	proclist_head waiters;		/* list of waiting PGPROCs */
#ifdef LOCK_DEBUG
	pg_atomic_uint32 nwaiters;	/* number of waiters */
	struct PGPROC *owner;		/* last exclusive owner of the lock */
#endif
} LWLock;

在PostgreSQL中LWLock根据使用场景不同,被细化为多个子模块

/*
 * Every tranche ID less than NUM_INDIVIDUAL_LWLOCKS is reserved; also,
 * we reserve additional tranche IDs for builtin tranches not included in
 * the set of individual LWLocks.  A call to LWLockNewTrancheId will never
 * return a value less than LWTRANCHE_FIRST_USER_DEFINED.
 */
typedef enum BuiltinTrancheIds
{
	LWTRANCHE_CLOG_BUFFERS = NUM_INDIVIDUAL_LWLOCKS,
	LWTRANCHE_COMMITTS_BUFFERS,
	LWTRANCHE_SUBTRANS_BUFFERS,
	LWTRANCHE_MXACTOFFSET_BUFFERS,
	LWTRANCHE_MXACTMEMBER_BUFFERS,
	LWTRANCHE_ASYNC_BUFFERS,
	LWTRANCHE_OLDSERXID_BUFFERS,
	LWTRANCHE_WAL_INSERT,
	LWTRANCHE_BUFFER_CONTENT,
	LWTRANCHE_BUFFER_IO_IN_PROGRESS,
	LWTRANCHE_REPLICATION_ORIGIN,
	LWTRANCHE_REPLICATION_SLOT_IO_IN_PROGRESS,
	LWTRANCHE_PROC,
	LWTRANCHE_BUFFER_MAPPING,
	LWTRANCHE_LOCK_MANAGER,
	LWTRANCHE_PREDICATE_LOCK_MANAGER,
	LWTRANCHE_PARALLEL_HASH_JOIN,
	LWTRANCHE_PARALLEL_QUERY_DSA,
	LWTRANCHE_SESSION_DSA,
	LWTRANCHE_SESSION_RECORD_TABLE,
	LWTRANCHE_SESSION_TYPMOD_TABLE,
	LWTRANCHE_SHARED_TUPLESTORE,
	LWTRANCHE_TBM,
	LWTRANCHE_PARALLEL_APPEND,
	LWTRANCHE_FIRST_USER_DEFINED
}			BuiltinTrancheIds;

const char *const MainLWLockNames[] = {
	"<unassigned:0>",
	"ShmemIndexLock",
	"OidGenLock",
	"XidGenLock",
	"ProcArrayLock",
	"SInvalReadLock",
	"SInvalWriteLock",
	"WALBufMappingLock",
	"WALWriteLock",
	"ControlFileLock",
	"CheckpointLock",
	"CLogControlLock",
	"SubtransControlLock",
	"MultiXactGenLock",
	"MultiXactOffsetControlLock",
	"MultiXactMemberControlLock",
	"RelCacheInitLock",
	"CheckpointerCommLock",
	"TwoPhaseStateLock",
	"TablespaceCreateLock",
	"BtreeVacuumLock",
	"AddinShmemInitLock",
	"AutovacuumLock",
	"AutovacuumScheduleLock",
	"SyncScanLock",
	"RelationMappingLock",
	"AsyncCtlLock",
	"AsyncQueueLock",
	"SerializableXactHashLock",
	"SerializableFinishedListLock",
	"SerializablePredicateLockListLock",
	"OldSerXidLock",
	"SyncRepLock",
	"BackgroundWorkerLock",
	"DynamicSharedMemoryControlLock",
	"AutoFileLock",
	"ReplicationSlotAllocationLock",
	"ReplicationSlotControlLock",
	"CommitTsControlLock",
	"CommitTsLock",
	"ReplicationOriginLock",
	"MultiXactTruncationLock",
	"OldSnapshotTimeMapLock",
	"LogicalRepWorkerLock",
	"CLogTruncationLock"
};

LWLock的初始化:
在PG初始化shared mem和信号量时,会初始化LWLock array(CreateLWLocks)。
具体为:

  1. 计算LWLock需要占用的shared mem的内存空间:算出固定的和每个子模块(requested named tranches)LWLock的个数(固定在系统初始化阶段就需要分配的LWLock有:buffer_mapping,lock_manager,predicate_lock_manager,parallel_query_dsa,tbm),每个LWLock的大小(LWLOCK_PADDED_SIZE+counter,couter为计数器,记录share锁的数量),子模块的信息占用大小。
  2. 分配内存空间,与cache line对齐。
  3. LWLockInitialize函数依次对每个LWLock做初始化,并将LWLock的状态置为LW_FLAG_RELEASE_OK。
/*
 * Initialize LWLocks that are fixed and those belonging to named tranches.
 */
static void
InitializeLWLocks(void)
{
	int			numNamedLocks = NumLWLocksByNamedTranches();
	int			id;
	int			i;
	int			j;
	LWLockPadded *lock;

	/* Initialize all individual LWLocks in main array */
	/* 初始化BuiltinTrancheIds enum成员*/
	for (id = 0, lock = MainLWLockArray; id < NUM_INDIVIDUAL_LWLOCKS; id++, lock++)
		LWLockInitialize(&lock->lock, id);

	/* Initialize buffer mapping LWLocks in main array */
	lock = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS;
	for (id = 0; id < NUM_BUFFER_PARTITIONS; id++, lock++)
		LWLockInitialize(&lock->lock, LWTRANCHE_BUFFER_MAPPING);

	/* Initialize lmgrs' LWLocks in main array */
	lock = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS + NUM_BUFFER_PARTITIONS;
	for (id = 0; id < NUM_LOCK_PARTITIONS; id++, lock++)
		LWLockInitialize(&lock->lock, LWTRANCHE_LOCK_MANAGER);

	/* Initialize predicate lmgrs' LWLocks in main array */
	lock = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS +
		NUM_BUFFER_PARTITIONS + NUM_LOCK_PARTITIONS;
	for (id = 0; id < NUM_PREDICATELOCK_PARTITIONS; id++, lock++)
		LWLockInitialize(&lock->lock, LWTRANCHE_PREDICATE_LOCK_MANAGER);

	/* Initialize named tranches. */
	if (NamedLWLockTrancheRequests > 0)
	{
		char	   *trancheNames;

		NamedLWLockTrancheArray = (NamedLWLockTranche *)
			&MainLWLockArray[NUM_FIXED_LWLOCKS + numNamedLocks];

		trancheNames = (char *) NamedLWLockTrancheArray +
			(NamedLWLockTrancheRequests * sizeof(NamedLWLockTranche));
		lock = &MainLWLockArray[NUM_FIXED_LWLOCKS];

		for (i = 0; i < NamedLWLockTrancheRequests; i++)
		{
			NamedLWLockTrancheRequest *request;
			NamedLWLockTranche *tranche;
			char	   *name;

			request = &NamedLWLockTrancheRequestArray[i];
			tranche = &NamedLWLockTrancheArray[i];

			name = trancheNames;
			trancheNames += strlen(request->tranche_name) + 1;
			strcpy(name, request->tranche_name);
			tranche->trancheId = LWLockNewTrancheId();
			tranche->trancheName = name;

			for (j = 0; j < request->num_lwlocks; j++, lock++)
				LWLockInitialize(&lock->lock, tranche->trancheId);
		}
	}
}
  1. LWLockRegisterTranche函数注册所有的已经初始化LWLock的子模块,包括系统预先定义(BuiltinTrancheIds)的和用户自定义的。
/*
 * Register named tranches and tranches for fixed LWLocks.
 */
static void
RegisterLWLockTranches(void)
{
	int			i;

	if (LWLockTrancheArray == NULL)
	{
		LWLockTranchesAllocated = 128;
		LWLockTrancheArray = (const char **)
			MemoryContextAllocZero(TopMemoryContext,
								   LWLockTranchesAllocated * sizeof(char *));
		Assert(LWLockTranchesAllocated >= LWTRANCHE_FIRST_USER_DEFINED);
	}

	for (i = 0; i < NUM_INDIVIDUAL_LWLOCKS; ++i)
		/* 注册MainLWLockNames[] array成员*/
		LWLockRegisterTranche(i, MainLWLockNames[i]);

	LWLockRegisterTranche(LWTRANCHE_BUFFER_MAPPING, "buffer_mapping");
	LWLockRegisterTranche(LWTRANCHE_LOCK_MANAGER, "lock_manager");
	LWLockRegisterTranche(LWTRANCHE_PREDICATE_LOCK_MANAGER,
						  "predicate_lock_manager");
	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
						  "parallel_query_dsa");
	LWLockRegisterTranche(LWTRANCHE_SESSION_DSA,
						  "session_dsa");
	LWLockRegisterTranche(LWTRANCHE_SESSION_RECORD_TABLE,
						  "session_record_table");
	LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
						  "session_typmod_table");
	LWLockRegisterTranche(LWTRANCHE_SHARED_TUPLESTORE,
						  "shared_tuplestore");
	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
	LWLockRegisterTranche(LWTRANCHE_PARALLEL_HASH_JOIN, "parallel_hash_join");
	LWLockRegisterTranche(LWTRANCHE_SXACT, "serializable_xact");

	/* Register named tranches. */
	for (i = 0; i < NamedLWLockTrancheRequests; i++)
		LWLockRegisterTranche(NamedLWLockTrancheArray[i].trancheId,
							  NamedLWLockTrancheArray[i].trancheName);
}

LWLock的使用:
1.获取锁

调用LWLockAcquire(LWLock *lock, LWLockMode mode)函数来加锁,其中mode可以为LW_SHARED(共享)和LW_EXCLUSIVE(排他)。
加锁时,首先把需要加的锁放入等待队列,然后通过LWLock中的state状态判断是否可以加锁成功,如果可以加锁成功,使用原子操作campare and set来修改LWLock的状态,把锁从等待队列中删除。否则,需要等锁。

还可以使用LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)来获取锁,与LWLockAcquire不同的是如果获取不到直接返回,不会休眠等待。

LWLockAcquireOrWait函数,如果加锁不成功,会一直等待,但是如果锁状态变为free之后,不会再加锁而是直接返回;当前这个函数在WALWriteLock中被使用,当一个backend需要flush WAL时,会加上WALWriteLock,然后会顺带把其它backend产生的WAL也flush了,因此,其它等锁去flush WAL的backend其实也并不需要再去flush WAL了

/*
 * LWLockAcquire - acquire a lightweight lock in the specified mode
 *
 * If the lock is not available, sleep until it is.  Returns true if the lock
 * was available immediately, false if we had to sleep.
 *
 * Side effect: cancel/die interrupts are held off until lock release.
 */
bool
LWLockAcquire(LWLock *lock, LWLockMode mode)
{
	PGPROC	   *proc = MyProc;
	bool		result = true;
	int			extraWaits = 0;
#ifdef LWLOCK_STATS
	lwlock_stats *lwstats;

	lwstats = get_lwlock_stats_entry(lock);
#endif

	AssertArg(mode == LW_SHARED || mode == LW_EXCLUSIVE);

	PRINT_LWDEBUG("LWLockAcquire", lock, mode);

#ifdef LWLOCK_STATS
	/* Count lock acquisition attempts */
	if (mode == LW_EXCLUSIVE)
		lwstats->ex_acquire_count++;
	else
		lwstats->sh_acquire_count++;
#endif							/* LWLOCK_STATS */

	/*
	 * We can't wait if we haven't got a PGPROC.  This should only occur
	 * during bootstrap or shared memory initialization.  Put an Assert here
	 * to catch unsafe coding practices.
	 */
	Assert(!(proc == NULL && IsUnderPostmaster));

	/* Ensure we will have room to remember the lock */
	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
		elog(ERROR, "too many LWLocks taken");

	/*
	 * Lock out cancel/die interrupts until we exit the code section protected
	 * by the LWLock.  This ensures that interrupts will not interfere with
	 * manipulations of data structures in shared memory.
	 */
	HOLD_INTERRUPTS();

	/*
	 * Loop here to try to acquire lock after each time we are signaled by
	 * LWLockRelease.
	 *
	 * NOTE: it might seem better to have LWLockRelease actually grant us the
	 * lock, rather than retrying and possibly having to go back to sleep. But
	 * in practice that is no good because it means a process swap for every
	 * lock acquisition when two or more processes are contending for the same
	 * lock.  Since LWLocks are normally used to protect not-very-long
	 * sections of computation, a process needs to be able to acquire and
	 * release the same lock many times during a single CPU time slice, even
	 * in the presence of contention.  The efficiency of being able to do that
	 * outweighs the inefficiency of sometimes wasting a process dispatch
	 * cycle because the lock is not free when a released waiter finally gets
	 * to run.  See pgsql-hackers archives for 29-Dec-01.
	 */
	 
	 /* 主循环*/
	for (;;)
	{
		bool		mustwait;

		/*
		 * Try to grab the lock the first time, we're not in the waitqueue
		 * yet/anymore.
		 */
		 /* 第一次尝试加锁,如果成功,函数返回false,并跳出循环 */
		mustwait = LWLockAttemptLock(lock, mode);
        /* mustwait 为false,说明加锁成功,跳出循环 */
		if (!mustwait)
		{
			LOG_LWDEBUG("LWLockAcquire", lock, "immediately acquired lock");
			break;				/* got the lock */
		}

		/*
		 * Ok, at this point we couldn't grab the lock on the first try. We
		 * cannot simply queue ourselves to the end of the list and wait to be
		 * woken up because by now the lock could long have been released.
		 * Instead add us to the queue and try to grab the lock again. If we
		 * succeed we need to revert the queuing and be happy, otherwise we
		 * recheck the lock. If we still couldn't grab it, we know that the
		 * other locker will see our queue entries when releasing since they
		 * existed before we checked for the lock.
		 */
        /* 第一次尝试加锁失败,因此将该锁加入到等待队列中*/ 
		/* add to the queue */
		LWLockQueueSelf(lock, mode);

		/* we're now guaranteed to be woken up if necessary */
		mustwait = LWLockAttemptLock(lock, mode);
        /* 第二次尝试获取锁成功,成功获取后,将锁从等待队列中撤销 */
		/* ok, grabbed the lock the second time round, need to undo queueing */
		if (!mustwait)
		{
			LOG_LWDEBUG("LWLockAcquire", lock, "acquired, undoing queue");

			LWLockDequeueSelf(lock);
			break;
		}

		/*
		 * Wait until awakened.
		 *
		 * Since we share the process wait semaphore with the regular lock
		 * manager and ProcWaitForSignal, and we may need to acquire an LWLock
		 * while one of those is pending, it is possible that we get awakened
		 * for a reason other than being signaled by LWLockRelease. If so,
		 * loop back and wait again.  Once we've gotten the LWLock,
		 * re-increment the sema by the number of additional signals received,
		 * so that the lock manager or signal manager will see the received
		 * signal when it next waits.
		 */
        /* 以下就是等锁逻辑了,是通过PGSemaphoreLock函数实现的 */
		LOG_LWDEBUG("LWLockAcquire", lock, "waiting");

#ifdef LWLOCK_STATS
		lwstats->block_count++;
#endif
        /* 记录当前等待事件类型为LW_Lock,并传递具体事件lock->tranche (tranche这个枚举成员在文章前边展示过)*/ 
		LWLockReportWaitStart(lock);
		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
        
        /* 等锁操作 */
		for (;;)
		{   /* 信号量加锁 */
			PGSemaphoreLock(proc->sem);
			if (!proc->lwWaiting)
				break;
			extraWaits++;
		}

		/* Retrying, allow LWLockRelease to release waiters again. */
		pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_RELEASE_OK);

#ifdef LOCK_DEBUG
		{
			/* not waiting anymore */
			uint32		nwaiters PG_USED_FOR_ASSERTS_ONLY = pg_atomic_fetch_sub_u32(&lock->nwaiters, 1);

			Assert(nwaiters < MAX_BACKENDS);
		}
#endif
       
		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
		LWLockReportWaitEnd();
        /* 等锁结束 */
		LOG_LWDEBUG("LWLockAcquire", lock, "awakened");

		/* Now loop back and try to acquire lock again. */
		result = false;
	}

	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);

	/* Add lock to list of locks held by this backend */
	held_lwlocks[num_held_lwlocks].lock = lock;
	held_lwlocks[num_held_lwlocks++].mode = mode;

	/*
	 * Fix the process wait semaphore's count for any absorbed wakeups.
	 */
    /* 信号量解锁 */
	while (extraWaits-- > 0)
		PGSemaphoreUnlock(proc->sem);

	return result;
}
  1. 等锁:

等锁是由PGSemaphoreLock函数完成的,当没有加上锁时,会等待一个信号量proc->sem(此时会休眠,不会消耗CPU)。

/*
 * PGSemaphoreLock
 *
 * Lock a semaphore (decrement count), blocking if count would be < 0
 */
void
PGSemaphoreLock(PGSemaphore sema)
{
	int			errStatus;

	/* See notes in sysv_sema.c's implementation of PGSemaphoreLock. */
	do
	{   /* 调用sem_wait函数,等待信号量,如果信号量的值大于0*/
	    /* 将信号量的值减1,立即返回。如果信号量的值为0,则线程阻塞。*/
	    /* 相当于P操作。成功返回0,失败返回-1 */
	    /* sem指向的对象是由sem_init调用初始化的信号量*/
		errStatus = sem_wait(PG_SEM_REF(sema));
	} while (errStatus < 0 && errno == EINTR);

	if (errStatus < 0)
		elog(FATAL, "sem_wait failed: %m");
}
  1. 释放锁:
    由LWLockRelease(LWLock *lock)函数完成
3.Lock

Lock是pg中的重量级锁,主要用来操作数据库对象,分类如下:

/* NoLock is not a lock mode, but a flag value meaning "don't get a lock" */
#define NoLock					0

#define AccessShareLock			1	/* SELECT */
#define RowShareLock			2	/* SELECT FOR UPDATE/FOR SHARE */
#define RowExclusiveLock		3	/* INSERT, UPDATE, DELETE */
#define ShareUpdateExclusiveLock 4	/* VACUUM (non-FULL),ANALYZE, CREATE INDEX
									 * CONCURRENTLY */
#define ShareLock				5	/* CREATE INDEX (WITHOUT CONCURRENTLY) */
#define ShareRowExclusiveLock	6	/* like EXCLUSIVE MODE, but allows ROW
									 * SHARE */
#define ExclusiveLock			7	/* blocks ROW SHARE/SELECT...FOR UPDATE */
#define AccessExclusiveLock		8	/* ALTER TABLE, DROP TABLE, VACUUM FULL,
									 * and unqualified LOCK TABLE */

Lock的使用场景较多,我们拿一个update语句执行堆栈来分析加锁,等锁过程


[postgres@postgres_zabbix ~]$ pstack 21318
#0  0x00007f1706ee95e3 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x0000000000853571 in WaitEventSetWaitBlock (set=0x2164778, cur_timeout=-1, occurred_events=0x7ffcac6f15b0, nevents=1) at latch.c:1080
#2  0x000000000085344c in WaitEventSetWait (set=0x2164778, timeout=-1, occurred_events=0x7ffcac6f15b0, nevents=1, wait_event_info=50331652) at latch.c:1032
#3  0x0000000000852d38 in WaitLatchOrSocket (latch=0x7f17001cf5c4, wakeEvents=33, sock=-1, timeout=-1, wait_event_info=50331652) at latch.c:407
#4  0x0000000000852c03 in WaitLatch (latch=0x7f17001cf5c4, wakeEvents=33, timeout=0, wait_event_info=50331652) at latch.c:347
#5  0x0000000000867ccb in ProcSleep (locallock=0x2072938, lockMethodTable=0xb8f5a0 <default_lockmethod>) at proc.c:1289
#6  0x0000000000861f98 in WaitOnLock (locallock=0x2072938, owner=0x2081d40) at lock.c:1768
#7  0x00000000008610be in LockAcquireExtended (locktag=0x7ffcac6f1a90, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1050
#8  0x0000000000860713 in LockAcquire (locktag=0x7ffcac6f1a90, lockmode=5, sessionLock=false, dontWait=false) at lock.c:713
#9  0x000000000085f592 in XactLockTableWait (xid=501, rel=0x7f1707c8fb10, ctid=0x7ffcac6f1b44, oper=XLTW_Update) at lmgr.c:658
#10 0x00000000004c99c9 in heap_update (relation=0x7f1707c8fb10, otid=0x7ffcac6f1e60, newtup=0x2164708, cid=1, crosscheck=0x0, wait=true, tmfd=0x7ffcac6f1d60, lockmode=0x7ffcac6f1d5c) at heapam.c:3228
#11 0x00000000004d411c in heapam_tuple_update (relation=0x7f1707c8fb10, otid=0x7ffcac6f1e60, slot=0x2162108, cid=1, snapshot=0x2074500, crosscheck=0x0, wait=true, tmfd=0x7ffcac6f1d60, lockmode=0x7ffcac6f1d5c, update_indexes=0x7ffcac6f1d5b) at heapam_handler.c:332
#12 0x00000000006db007 in table_tuple_update (rel=0x7f1707c8fb10, otid=0x7ffcac6f1e60, slot=0x2162108, cid=1, snapshot=0x2074500, crosscheck=0x0, wait=true, tmfd=0x7ffcac6f1d60, lockmode=0x7ffcac6f1d5c, update_indexes=0x7ffcac6f1d5b) at ../../../src/include/access/tableam.h:1275
#13 0x00000000006dce83 in ExecUpdate (mtstate=0x2160b40, tupleid=0x7ffcac6f1e60, oldtuple=0x0, slot=0x2162108, planSlot=0x21613a0, epqstate=0x2160c38, estate=0x21607c0, canSetTag=true) at nodeModifyTable.c:1311
#14 0x00000000006de36c in ExecModifyTable (pstate=0x2160b40) at nodeModifyTable.c:2222
#15 0x00000000006b2b07 in ExecProcNodeFirst (node=0x2160b40) at execProcnode.c:445
#16 0x00000000006a8ce7 in ExecProcNode (node=0x2160b40) at ../../../src/include/executor/executor.h:239
#17 0x00000000006ab063 in ExecutePlan (estate=0x21607c0, planstate=0x2160b40, use_parallel_mode=false, operation=CMD_UPDATE, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x2146860, execute_once=true) at execMain.c:1646
#18 0x00000000006a91c4 in standard_ExecutorRun (queryDesc=0x2152ab0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:364
#19 0x00000000006a9069 in ExecutorRun (queryDesc=0x2152ab0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:308
#20 0x000000000088017a in ProcessQuery (plan=0x2146780, sourceText=0x204b040 "update test_tbl set id=4 where id=3;", params=0x0, queryEnv=0x0, dest=0x2146860, completionTag=0x7ffcac6f2270 "") at pquery.c:161
#21 0x00000000008818c1 in PortalRunMulti (portal=0x20b6570, isTopLevel=true, setHoldSnapshot=false, dest=0x2146860, altdest=0x2146860, completionTag=0x7ffcac6f2270 "") at pquery.c:1283
#22 0x0000000000880efb in PortalRun (portal=0x20b6570, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2146860, altdest=0x2146860, completionTag=0x7ffcac6f2270 "") at pquery.c:796
#23 0x000000000087b28f in exec_simple_query (query_string=0x204b040 "update test_tbl set id=4 where id=3;") at postgres.c:1215
#24 0x000000000087f30f in PostgresMain (argc=1, argv=0x207a6e0, dbname=0x207a578 "postgres", username=0x207a558 "postgres") at postgres.c:4247
#25 0x00000000007e6a9e in BackendRun (port=0x20702a0) at postmaster.c:4437
#26 0x00000000007e629d in BackendStartup (port=0x20702a0) at postmaster.c:4128
#27 0x00000000007e293d in ServerLoop () at postmaster.c:1704
#28 0x00000000007e21fd in PostmasterMain (argc=1, argv=0x2045c00) at postmaster.c:1377
#29 0x000000000070f76d in main (argc=1, argv=0x2045c00) at main.c:228
[postgres@postgres_zabbix ~]$

这是一个被阻塞的update语句,目前在等锁状态(等待之前持有锁的事务提交)。

postgres=# select pid,wait_event_type,wait_event,query from pg_stat_activity where pid=21318;
-[ RECORD 1 ]---+-------------------------------------
pid             | 21318
wait_event_type | Lock
wait_event      | transactionid
query           | update test_tbl set id=4 where id=3;

加锁:

调用LockAcquire (locktag=0x7ffcac6f1a90, lockmode=5, sessionLock=false, dontWait=false) ,申请的LockMode为5,即 ShareLock

LockAcquire函数体计较长,这里只概述大致的逻辑:
1)根据locktag中给定的需要加锁对象的相关信息查询hash表。因为同一锁可能被持有多次,为了加快访问速度,故而将这些所缓存在hash table中。
例如:
当我们需要执行对table进行加锁操作时,会将我们所需要操作的数据库编号,表的编号等信息存储在locktag中;
当我们需要执行对Tuple进行加锁操作时候,会将数据库编号,表的编号,块号及相应的偏移量等信息设置在locktag中。SET_LOCKTAG_XXX完成了对于相应LOCKTAG的设置工作;
因此首先是查找LocalLOCK hash表并根据结果进行相应的处理;

locallock = (LOCALLOCK *) hash_search(LockMethodLocalHash,
										  (void *) &localtag,
										  HASH_ENTER, &found);

2)检查该对象是否已经获取相应的锁;
3)依据相应条件,对该锁申请操作添加WAL日志;
4)进行锁冲突检测;
5)当不存在相应的访问冲突后,则进行锁申请操作并记录下该资源对于锁的使用情况;当发现存在着访问冲突后,需要进行锁等待处理,使用WaitOnLock进行等待(底层是epoll实现的);
6)告知锁的申请结果

等锁:
调用WaitOnLock (locallock=0x2072938, owner=0x2081d40)
底层实现是通过epoll实现的,可以看到顶层堆栈在epoll_wait函数中

#0  0x00007f1706ee95e3 in __epoll_wait_nocancel () from /lib64/libc.so.6

释放锁:

调用:LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)

不详细分析了,在事务提交/回滚后释放锁。

二、PostgreSQL中锁的对比

spinlock
主要特点:轻量锁,只有排他一种模式,无等待队列,等锁为忙等待,cpu空转

使用场景:占锁时间短,对临界资源进行简单访问,并且临界区较短。临界区通常是简单的赋值语句,读取语句等等

LWLock
主要特点:轻量级锁,除了排他模式,还存在共享模式。存在等待队列,等锁通过sem_wait实现

使用场景:临界区较长,且逻辑关系比较复杂,对临界资源的操作比较复杂。比如操作Clog buffer(事务提交状态缓存)、Shared buffers(数据页缓存)、wal buffer(wal缓存)等等。

Lock

主要特点:重量级锁,持有时间可以很长,等锁通过epoll实现

使用场景:对所有数据库对象的操作,例如表的增删改查

参考:
https://zhuanlan.zhihu.com/p/73517810
http://www.postgres.cn/news/viewone/1/241

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值