PostgreSQL 基础模块---缓冲池管理

参考资料

《PostgreSQL数据库内核分析》 彭智勇 彭煜玮:P99~P101

概述

在PostgreSQL中,任何对于表、元组、索引等操作都在缓冲池中进行,缓冲池的数据调度都以磁盘块为单位,需要访问的数据块以磁盘块为单位调用函数smgrread写入缓冲区,而smgrwrite将缓冲池数据写回磁盘。调入缓冲池中的磁盘块称为缓冲区,多个缓冲区组成的缓冲池

PostgreSQL有两种缓冲池:共享缓冲池和本地缓冲池。共享缓冲池主要作为普通表的操作场所,本地缓冲池则仅本地可见的临时表的操作场所。本文仅对共享缓冲池进行阐述。

对缓冲池中,缓冲区的管理通过两种机制完成:

  1. pin

    当进程要访问缓冲区前,对于缓冲区加pin,pin的数目保存在缓冲区的refcount属性中。当refcount不为0时表明有进程正在访问缓冲区,此时该缓冲区不能被替换

  2. lock

    lock机制为缓冲区的并发访问提供了保障,当进程对缓冲区进行写操作时加EXCLUSIVE锁,读操作加SHARE锁。比如:Insert操作,在获取到缓冲区后需要先将缓冲区加EXCLUSIVE锁。(加锁操作在RelationGetBufferForTuple函数中进行,详见插入流程)。

初始化共享缓冲区

共享缓冲池的初始化工作由InitBufferPool来完成。在共享缓冲池管理中,使用了一个全局数组BufferDescriptors来管理缓冲池中的缓冲区,其数组元素类型为BufferDesc。另外使用了一个全局指针变量BufferBlocks来存储缓冲池的起始地址。

下面先来看看BufferDesc的定义:

typedef struct BufferDesc
{
	BufferTag	tag;					/* ID of page contained in buffer */
	int			buf_id;					/* buffer's index number (from 0) */

	/* state of the tag, containing flags, refcount and usagecount */
	pg_atomic_uint32 state;

	int			wait_backend_pid;		/* backend PID of pin-count waiter */
	int			freeNext;				/* link in freelist chain */

	LWLock		content_lock;			/* to lock access to buffer contents */
} BufferDesc;

其中:

  • tag:用于标识该缓冲块的物理信息,具体定义如下:

    typedef struct buftag
    {
    	RelFileNode rnode;			/* 表所在表空间oid,数据库oid,表本身oid组成 */
    	ForkNumber	forkNum;		/* 枚举类型,标记缓冲区中是什么类型的文件块 */
    	BlockNumber blockNum;		/* 块号 */
    } BufferTag;
    

    tag唯一标识了一个物理块,注意是物理块!(后面的缓冲区加载流程会再次用到tag)

  • buf_id:缓冲区的索引号,buf_id唯一标识了一个缓冲区。对缓冲区的各种操作都会用到buf_id。

    共享缓冲区和本地缓冲区都使用buf_id,他们的编号规则不同:共享缓冲区的buf_id从0开始编号,后续依次加1。而本地缓冲区的buf_id从-2开始编号,后续依次减1。

    /* 本地缓冲区从-2开始编号 */
    #define LocalBufHdrGetBlock(bufHdr) LocalBufferBlockPointers[-((bufHdr)->buf_id + 2)]
    
  • state:由flags、refcount、usagecount组成

    • flags:标志位,表示缓冲区是否为脏等。
    • refcount:表示当前正在引用该块缓冲区的进程数,通过pin操作来修改该字段。
    • usagecount:最近缓冲区使用次数,用于缓冲区替换。
  • wait_backend_pid:用于记录一个请求修改缓冲区的进程号。

  • freeNext:如果当前缓冲区在空闲链中,则freeNext指向下一个空闲缓冲区。

  • content_lock:当进程访问缓冲块时,会在content_lock上加锁,读访问加LW_SHARE锁,写访问加LW_EXCLUSIVE锁,此锁可以防止因多个进程对缓冲区访问的冲突而造成数据不一致。

缓冲区的操作

前面说到共享缓冲池管理中有两个全局变量:BufferDesc数组BufferDescriptors和BufferBlocks指针。那么这两个全局变量之间有什么关系,两者由如何转换?

首先,BufferDescriptors是一个数组,数组元素的个数为N。N=缓冲池中缓冲区的数量,默认值为1000。BufferBlocks是一段连续的内存空间,大小为BLCKSZ*N,所以BufferBlocks也可以理解为一个数组,数组元素个数为N,每个数组元素都是一个缓冲区。

在BufferDesc中有一个成员buf_id,这个值表示了当前的BufferDesc在BufferDescriptors中的下标,即

BufferDesc == BufferDescriptors[BufferDesc ->buf_id]。

所以根据buf_id就可以从BufferDescriptors中获取BufferDesc,也可以从BufferBlocks中获取实际的缓冲区。具体操作见如下函数:

/* 返回一个bufferid,后续的操作都是基于bufferid进行 */
#define BufferDescriptorGetBuffer(bdesc) ((bdesc)->buf_id + 1)
/* 从BufferDescriptors中获取一个BufferDesc */
#define GetBufferDescriptor(id) (&BufferDescriptors[(id)].bufferdesc)
/* 从BufferBlocks中获取一个缓冲区 */
#define BufferGetPage(buffer) ((Page)BufferGetBlock(buffer))

#define BufferIsLocal(buffer)	((buffer) < 0)	/* 判断是否是本地缓冲区 */

#define BufferGetBlock(buffer) \
( \
	AssertMacro(BufferIsValid(buffer)), \
	BufferIsLocal(buffer) ? \
		LocalBufferBlockPointers[-(buffer) - 1] \
	: \
		(Block) (BufferBlocks + ((Size) ((buffer) - 1)) * BLCKSZ) \
)

对于GetBufferDescriptor的调用需要的参数直接是BufferDesc的数组下标,但对于BufferGetBlock的调用需要的参数却必须是BufferDescriptorGetBuffer的返回值,即数组下标+1。目前尚不清楚为什么要这样设计。

InitBufferPool的主要功能

InitBufferPool主要做三件事:

  1. 初始化BufferDescriptors。

  2. 初始化BufferBlocks。

  3. 初始化缓冲区hash表。

    初始化缓冲区hash表,在StrategyInitialize中调用InitBufTable来完成。缓冲区hash表的作用在共享缓冲区的加载中来讲。

共享缓冲区加载(查询)

当PostgreSQL读写一个物理块时,首先需要把物理块读取到共享缓冲区中,然后再从缓冲区中读写数据。从物理块读取到共享缓冲区的过程称为共享缓冲区加载。ReadBuffer_common是所有缓冲区的通用函数,定义了本地缓冲区和共享缓冲区的通用读取方法。代码如下:

static Buffer
ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
				  BlockNumber blockNum, ReadBufferMode mode,
				  BufferAccessStrategy strategy, bool *hit)
{
	BufferDesc *bufHdr;
	Block		bufBlock;
	bool		found;
	bool		isExtend;
	bool		isLocalBuf = SmgrIsTemp(smgr);

	*hit = false;

	/* Make sure we will have room to remember the buffer pin */
	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);

	isExtend = (blockNum == P_NEW);

	TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum,
									   smgr->smgr_rnode.node.spcNode,
									   smgr->smgr_rnode.node.dbNode,
									   smgr->smgr_rnode.node.relNode,
									   smgr->smgr_rnode.backend,
									   isExtend);

	/* Substitute proper block number if caller asked for P_NEW */
	if (isExtend)
		blockNum = smgrnblocks(smgr, forkNum);

	if (isLocalBuf)
	{
		bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found);
		if (found)
			pgBufferUsage.local_blks_hit++;
		else
			pgBufferUsage.local_blks_read++;
	}
	else
	{
		/*
		 * lookup the buffer.  IO_IN_PROGRESS is set if the requested block is
		 * not currently in memory.
		 */
		bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
							 strategy, &found);
		if (found)
			pgBufferUsage.shared_blks_hit++;
		else
			pgBufferUsage.shared_blks_read++;
	}

	/* At this point we do NOT hold any locks. */

	/* if it was already in the buffer pool, we're done */
	if (found)
	{
		if (!isExtend)
		{
			/* Just need to update stats before we exit */
			*hit = true;
			VacuumPageHit++;

			if (VacuumCostActive)
				VacuumCostBalance += VacuumCostPageHit;

			TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
											  smgr->smgr_rnode.node.spcNode,
											  smgr->smgr_rnode.node.dbNode,
											  smgr->smgr_rnode.node.relNode,
											  smgr->smgr_rnode.backend,
											  isExtend,
											  found);

			/*
			 * In RBM_ZERO_AND_LOCK mode the caller expects the page to be
			 * locked on return.
			 */
			if (!isLocalBuf)
			{
				if (mode == RBM_ZERO_AND_LOCK)
					LWLockAcquire(BufferDescriptorGetContentLock(bufHdr),
								  LW_EXCLUSIVE);
				else if (mode == RBM_ZERO_AND_CLEANUP_LOCK)
					LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr));
			}

			return BufferDescriptorGetBuffer(bufHdr);
		}

		/*
		 * We get here only in the corner case where we are trying to extend
		 * the relation but we found a pre-existing buffer marked BM_VALID.
		 * This can happen because mdread doesn't complain about reads beyond
		 * EOF (when zero_damaged_pages is ON) and so a previous attempt to
		 * read a block beyond EOF could have left a "valid" zero-filled
		 * buffer.  Unfortunately, we have also seen this case occurring
		 * because of buggy Linux kernels that sometimes return an
		 * lseek(SEEK_END) result that doesn't account for a recent write. In
		 * that situation, the pre-existing buffer would contain valid data
		 * that we don't want to overwrite.  Since the legitimate case should
		 * always have left a zero-filled buffer, complain if not PageIsNew.
		 */
		bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);
		if (!PageIsNew((Page) bufBlock))
			ereport(ERROR,
			 (errmsg("unexpected data beyond EOF in block %u of relation %s",
					 blockNum, relpath(smgr->smgr_rnode, forkNum)),
			  errhint("This has been seen to occur with buggy kernels; consider updating your system.")));

		/*
		 * We *must* do smgrextend before succeeding, else the page will not
		 * be reserved by the kernel, and the next P_NEW call will decide to
		 * return the same page.  Clear the BM_VALID bit, do the StartBufferIO
		 * call that BufferAlloc didn't, and proceed.
		 */
		if (isLocalBuf)
		{
			/* Only need to adjust flags */
			uint32		buf_state = pg_atomic_read_u32(&bufHdr->state);

			Assert(buf_state & BM_VALID);
			buf_state &= ~BM_VALID;
			pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
		}
		else
		{
			/*
			 * Loop to handle the very small possibility that someone re-sets
			 * BM_VALID between our clearing it and StartBufferIO inspecting
			 * it.
			 */
			do
			{
				uint32		buf_state = LockBufHdr(bufHdr);

				Assert(buf_state & BM_VALID);
				buf_state &= ~BM_VALID;
				UnlockBufHdr(bufHdr, buf_state);
			} while (!StartBufferIO(bufHdr, true));
		}
	}

	/*
	 * if we have gotten to this point, we have allocated a buffer for the
	 * page but its contents are not yet valid.  IO_IN_PROGRESS is set for it,
	 * if it's a shared buffer.
	 *
	 * Note: if smgrextend fails, we will end up with a buffer that is
	 * allocated but not marked BM_VALID.  P_NEW will still select the same
	 * block number (because the relation didn't get any longer on disk) and
	 * so future attempts to extend the relation will find the same buffer (if
	 * it's not been recycled) but come right back here to try smgrextend
	 * again.
	 */
	Assert(!(pg_atomic_read_u32(&bufHdr->state) & BM_VALID));	/* spinlock not needed */

	bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);

	if (isExtend)
	{
		/* new buffers are zero-filled */
		MemSet((char *) bufBlock, 0, BLCKSZ);
		/* don't set checksum for all-zero page */
		smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);

		/*
		 * NB: we're *not* doing a ScheduleBufferTagForWriteback here;
		 * although we're essentially performing a write. At least on linux
		 * doing so defeats the 'delayed allocation' mechanism, leading to
		 * increased file fragmentation.
		 */
	}
	else
	{
		/*
		 * Read in the page, unless the caller intends to overwrite it and
		 * just wants us to allocate a buffer.
		 */
		if (mode == RBM_ZERO_AND_LOCK || mode == RBM_ZERO_AND_CLEANUP_LOCK)
			MemSet((char *) bufBlock, 0, BLCKSZ);
		else
		{
			instr_time	io_start,
						io_time;

			if (track_io_timing)
				INSTR_TIME_SET_CURRENT(io_start);

			smgrread(smgr, forkNum, blockNum, (char *) bufBlock);

			if (track_io_timing)
			{
				INSTR_TIME_SET_CURRENT(io_time);
				INSTR_TIME_SUBTRACT(io_time, io_start);
				pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time));
				INSTR_TIME_ADD(pgBufferUsage.blk_read_time, io_time);
			}

			/* check for garbage data */
			if (!PageIsVerified((Page) bufBlock, blockNum))
			{
				if (mode == RBM_ZERO_ON_ERROR || zero_damaged_pages)
				{
					ereport(WARNING,
							(errcode(ERRCODE_DATA_CORRUPTED),
							 errmsg("invalid page in block %u of relation %s; zeroing out page",
									blockNum,
									relpath(smgr->smgr_rnode, forkNum))));
					MemSet((char *) bufBlock, 0, BLCKSZ);
				}
				else
					ereport(ERROR,
							(errcode(ERRCODE_DATA_CORRUPTED),
							 errmsg("invalid page in block %u of relation %s",
									blockNum,
									relpath(smgr->smgr_rnode, forkNum))));
			}
		}
	}

	/*
	 * In RBM_ZERO_AND_LOCK mode, grab the buffer content lock before marking
	 * the page as valid, to make sure that no other backend sees the zeroed
	 * page before the caller has had a chance to initialize it.
	 *
	 * Since no-one else can be looking at the page contents yet, there is no
	 * difference between an exclusive lock and a cleanup-strength lock. (Note
	 * that we cannot use LockBuffer() or LockBufferForCleanup() here, because
	 * they assert that the buffer is already valid.)
	 */
	if ((mode == RBM_ZERO_AND_LOCK || mode == RBM_ZERO_AND_CLEANUP_LOCK) &&
		!isLocalBuf)
	{
		LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_EXCLUSIVE);
	}

	if (isLocalBuf)
	{
		/* Only need to adjust flags */
		uint32		buf_state = pg_atomic_read_u32(&bufHdr->state);

		buf_state |= BM_VALID;
		pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
	}
	else
	{
		/* Set BM_VALID, terminate IO, and wake up any waiters */
		TerminateBufferIO(bufHdr, false, BM_VALID);
	}

	VacuumPageMiss++;
	if (VacuumCostActive)
		VacuumCostBalance += VacuumCostPageMiss;

	TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
									  smgr->smgr_rnode.node.spcNode,
									  smgr->smgr_rnode.node.dbNode,
									  smgr->smgr_rnode.node.relNode,
									  smgr->smgr_rnode.backend,
									  isExtend,
									  found);

	return BufferDescriptorGetBuffer(bufHdr);
}

代码较长,但就加载而言,只有两个步骤:

  • 步骤1:调用BufferAlloc从从共享缓冲区中获取一个buf。

    该buf中可能已经缓存了当前需要的块,此时直接返回即可。

    BufferAlloc的出参found,表示buf中是否缓存了当前块。

  • 步骤2:如果buf中没有缓存当前块,则需要调用smgrread将当前块从磁盘中读取到buf中。

不难看出,BufferAlloc是整个加载过程的核心,在查看BufferAlloc代码之前,我们带一个着问题来调试BufferAlloc:如果两个进程需要同时加载同一个物理块,那么如何保证这个块不会被重复加载?

块的重复加载问题

为了解决这个问题,我们先设计如下的测试步骤:

  1. 创建一张表。

    create table t1(a int);
    
  2. 向表中插入一条记录,此时该表就会包含一个物理块。

    insert into t1 values(1);
    
  3. 重启数据库,如此步骤2产生的物理块就不会存在于共享缓冲池中。

  4. 在BufferAlloc中打上断点。

  5. 开启两个客户端连接PostgreSQL,然后执行查询语句。

    select * from t1;
    

还记得InitBufferPool中初始化的hash表么,下面它将隆重登场,hash在这里相当于一个缓冲区字典,以物理块的BufferTag为key,以缓冲区的buf_id为value。BufferAlloc按照以下步骤执行:

  1. 将物理块对应表的表空间oid、数据库oid、本身oid等信息组成BufferTag(见:INIT_BUFFERTAG)。前面说过BufferTag唯一标识一个物理块。那么就可以以BufferTag为key在hash表中进行查询,若能够查询到相应的buf_id,则说明请求的物理块已经被加载到缓冲池中,那么直接返回(以BufferDesc的形式返回)。
  2. 当hash表中不存在时,则需要在找到一个空闲的缓冲区来装入文件。如果存在空闲缓冲区则返回该缓冲区,如果不存在则使用替换机制进行替换缓冲区。

BufferAlloc的代码如下:

static BufferDesc *
BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
			BlockNumber blockNum,
			BufferAccessStrategy strategy,
			bool *foundPtr)
{
	/***省略***/

	/* 
	 * see if the block is in the buffer pool already 
	 * 步骤1:检查物理块是否已经在缓冲区中。
	 */
	LWLockAcquire(newPartitionLock, LW_SHARED);
	buf_id = BufTableLookup(&newTag, newHash);
	if (buf_id >= 0)
	{
        //找到则直接返回
		/***省略***/
		return buf;
	}
	
    /***省略***/
	/* 
	 * Loop here in case we have to try another victim buffer 
	 */
	for (;;)
	{
		/*
		 * Ensure, while the spinlock's not yet held, that there's a free
		 * refcount entry.
		 */
		ReservePrivateRefCountEntry();

		/*
		 * Select a victim buffer.  The buffer is returned with its header
		 * spinlock still held!
		 * 步骤2:获取一个空闲缓冲区。
		 */
		buf = StrategyGetBuffer(strategy, &buf_state);

		/***省略***/

		/*
		 * To change the association of a valid buffer, we'll need to have
		 * exclusive lock on both the old and new mapping partitions.
		 */
		if (oldFlags & BM_TAG_VALID)
		{
			/*
			 * Need to compute the old tag's hashcode and partition lock ID.
			 * XXX is it worth storing the hashcode in BufferDesc so we need
			 * not recompute it here?  Probably not.
			 */
			oldTag = buf->tag;
			oldHash = BufTableHashCode(&oldTag);
			oldPartitionLock = BufMappingPartitionLock(oldHash);

			/*
			 * Must lock the lower-numbered partition first to avoid
			 * deadlocks.
			 */
			if (oldPartitionLock < newPartitionLock)
			{
				LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
				LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
			}
			else if (oldPartitionLock > newPartitionLock)
			{
				LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
				LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
			}
			else
			{
				/* only one partition, only one lock */
				LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
			}
		}
		else
		{
			/* if it wasn't valid, we need only the new partition */
			LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
			/* remember we have no old-partition lock or tag */
			oldPartitionLock = NULL;
			/* this just keeps the compiler quiet about uninit variables */
			oldHash = 0;
		}

		/*
		 * Try to make a hashtable entry for the buffer under its new tag.
		 * This could fail because while we were writing someone else
		 * allocated another buffer for the same block we want to read in.
		 * Note that we have not yet removed the hashtable entry for the old
		 * tag.
		 * 步骤3:将newTag插入BufTable。
		 */
		buf_id = BufTableInsert(&newTag, newHash, buf->buf_id);

		if (buf_id >= 0)
		{
			/*
			 * Got a collision. Someone has already done what we were about to
			 * do. We'll just handle this as if it were found in the buffer
			 * pool in the first place.  First, give up the buffer we were
			 * planning to use.
			 *
			 * 放弃当前获取到的buf
			 */
			UnpinBuffer(buf, true);

			/* Can give up that buffer's mapping partition lock now */
			if (oldPartitionLock != NULL &&
				oldPartitionLock != newPartitionLock)
				LWLockRelease(oldPartitionLock);

			/* remaining code should match code at top of routine */

			buf = GetBufferDescriptor(buf_id);
			
            /***pin buf_id对应的buf***/
			valid = PinBuffer(buf, strategy);

			/* Can release the mapping lock as soon as we've pinned it */
			LWLockRelease(newPartitionLock);

			*foundPtr = TRUE;

			if (!valid)
			{
				/*
				 * We can only get here if (a) someone else is still reading
				 * in the page, or (b) a previous read attempt failed.  We
				 * have to wait for any active read attempt to finish, and
				 * then set up our own read attempt if the page is still not
				 * BM_VALID.  StartBufferIO does it all.
				 */
				if (StartBufferIO(buf, true))
				{
					/*
					 * If we get here, previous attempts to read the buffer
					 * must have failed ... but we shall bravely try again.
					 */
					*foundPtr = FALSE;
				}
			}

			return buf;
		}

		/*
		 * Need to lock the buffer header too in order to change its tag.
		 */
		buf_state = LockBufHdr(buf);

		/***省略***/
	}

	/***省略***/

	return buf;
}

现在我们回到之前提出的问题:如果两个进程需要同时加载同一个物理块,那么如何保证这个块不会被重复加载?在调试的过程中,我们发现由于数据库重新启动,所以物理块肯定不会被加载到缓冲池中,所以步骤1的BufTableLookup返回值为-1,于是进入到了步骤2。并且此时两个进程都获取到了一个缓冲区!紧接着执行步骤3,调用函数BufTableInsert将获取到的buf进程插入hash表中(以BufferTag为key,buf_id为value)。但在插入hash表之前,首先对hash表加了互斥锁(上面代码47行~86行),于是两个进程变为了串行

接着进程1执行BufTableInsert,BufTableInsert会返回一个buf_id,由于在插入前hash表中没有相应的BufferTag,所以返回-1。当进程2执行BufTableInsert时,由于BufferTag已经被进程1插入到了hash表中,所以显然BufTableInsert会返回BufferTag对应的buf_id。此时进程2会放弃从StrategyGetBuffer中获取的buf(上面代码第108行),转为获取buf_id对应的buf(上面代码第120行)。

由此可见,通过对hash表的串行插入,防止了同一个物理块被重复加载的问题。

加载中间态问题

通过前面对ReadBuffer_common函数的描述,我们明白了一件事,就加载而言ReadBuffer_common有两个步骤:

  • 步骤1:调用BufferAlloc获取一个buf
  • 步骤2:调用smgrread将物理块读取到buf中

在重复加载的实验中,进程1就执行了上述两个步骤。而此时和进程1并发执行的进程2就会遇到一个问题:进程1在执行步骤1之后,hash表中就已经存在当前块对应的BufferTag了,而进程2也能看到这个BufferTag。但此时进程1可能尚未执行步骤2,或者正在执行步骤2,又或者步骤2执行失败,无论那种情况物理块都尚未读取到buf中,那么如果进程2直接使用这个buf显然会出问题。

所以在上述代码的120行,valid = PinBuffer(buf, strategy);Pin操作返回了一个valid,如果valid为false,则表示当前块尚未加载到缓存中,于是会调用StartBufferIO等待加载完毕(上面代码136行)。

  • 如果进程1加载成功,那么进程2的StartBufferIO会返回false。此时进程2的BufferAlloc会返回buf,同时出参found为ture,表示buf中已经加载了需要的块,无需再加载。
  • 如果进程1加载失败,那么进程2的StartBufferIO会返回true。此时进程2的BufferAlloc会返回buf,同时出参found为false,表示buf中没有加载了需要的块,需要smgrread将物理块读取到buf中。

缓冲区获取冲突问题

我们再来思考另外一个问题:如果两个进程需要加载不同的物理块,但是获取到了同一个缓冲区怎么办?获取缓冲区的函数为StrategyGetBuffer,该函数按如下步骤执行:

  1. 如果有空闲缓冲区,则获取一个空闲缓冲区。否则执行步骤2。
  2. 使用替换机制替换缓冲区。

代码如下:

BufferDesc *
StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
{
	BufferDesc *buf;
	int			bgwprocno;
	int			trycounter;
	uint32		local_buf_state;	/* to avoid repeated (de-)referencing */

	/*
	 * If given a strategy object, see whether it can select a buffer. We
	 * assume strategy objects don't need buffer_strategy_lock.
	 */
	if (strategy != NULL)
	{
		buf = GetBufferFromRing(strategy, buf_state);
		if (buf != NULL)
			return buf;
	}

	/*
	 * If asked, we need to waken the bgwriter. Since we don't want to rely on
	 * a spinlock for this we force a read from shared memory once, and then
	 * set the latch based on that value. We need to go through that length
	 * because otherwise bgprocno might be reset while/after we check because
	 * the compiler might just reread from memory.
	 *
	 * This can possibly set the latch of the wrong process if the bgwriter
	 * dies in the wrong moment. But since PGPROC->procLatch is never
	 * deallocated the worst consequence of that is that we set the latch of
	 * some arbitrary process.
	 */
	bgwprocno = INT_ACCESS_ONCE(StrategyControl->bgwprocno);
	if (bgwprocno != -1)
	{
		/* reset bgwprocno first, before setting the latch */
		StrategyControl->bgwprocno = -1;

		/*
		 * Not acquiring ProcArrayLock here which is slightly icky. It's
		 * actually fine because procLatch isn't ever freed, so we just can
		 * potentially set the wrong process' (or no process') latch.
		 */
		SetLatch(&ProcGlobal->allProcs[bgwprocno].procLatch);
	}

	/*
	 * We count buffer allocation requests so that the bgwriter can estimate
	 * the rate of buffer consumption.  Note that buffers recycled by a
	 * strategy object are intentionally not counted here.
	 */
	pg_atomic_fetch_add_u32(&StrategyControl->numBufferAllocs, 1);

	/*
	 * First check, without acquiring the lock, whether there's buffers in the
	 * freelist. Since we otherwise don't require the spinlock in every
	 * StrategyGetBuffer() invocation, it'd be sad to acquire it here -
	 * uselessly in most cases. That obviously leaves a race where a buffer is
	 * put on the freelist but we don't see the store yet - but that's pretty
	 * harmless, it'll just get used during the next buffer acquisition.
	 *
	 * If there's buffers on the freelist, acquire the spinlock to pop one
	 * buffer of the freelist. Then check whether that buffer is usable and
	 * repeat if not.
	 *
	 * Note that the freeNext fields are considered to be protected by the
	 * buffer_strategy_lock not the individual buffer spinlocks, so it's OK to
	 * manipulate them without holding the spinlock.
	 *
	 * 步骤1:获取空闲缓冲区
	 *
	 */
	if (StrategyControl->firstFreeBuffer >= 0)
	{
		while (true)
		{
			/* 
			 * Acquire the spinlock to remove element from the freelist 
			 * 加锁
			 */
			SpinLockAcquire(&StrategyControl->buffer_strategy_lock);

			if (StrategyControl->firstFreeBuffer < 0)
			{
				SpinLockRelease(&StrategyControl->buffer_strategy_lock);
				break;
			}

			buf = GetBufferDescriptor(StrategyControl->firstFreeBuffer);
			Assert(buf->freeNext != FREENEXT_NOT_IN_LIST);

			/* Unconditionally remove buffer from freelist */
			StrategyControl->firstFreeBuffer = buf->freeNext;
			buf->freeNext = FREENEXT_NOT_IN_LIST;

			/*
			 * Release the lock so someone else can access the freelist while
			 * we check out this buffer.
			 */
			SpinLockRelease(&StrategyControl->buffer_strategy_lock);

			/*
			 * If the buffer is pinned or has a nonzero usage_count, we cannot
			 * use it; discard it and retry.  (This can only happen if VACUUM
			 * put a valid buffer in the freelist and then someone else used
			 * it before we got to it.  It's probably impossible altogether as
			 * of 8.3, but we'd better check anyway.)
			 */
			local_buf_state = LockBufHdr(buf);
			if (BUF_STATE_GET_REFCOUNT(local_buf_state) == 0
				&& BUF_STATE_GET_USAGECOUNT(local_buf_state) == 0)
			{
				if (strategy != NULL)
					AddBufferToRing(strategy, buf);
				*buf_state = local_buf_state;
				return buf;
			}
			UnlockBufHdr(buf, local_buf_state);

		}
	}

	/* Nothing on the freelist, so run the "clock sweep" algorithm 
	 * 步骤2:使用替换机制替换缓冲区
	 */
	trycounter = NBuffers;
	for (;;)
	{
		buf = GetBufferDescriptor(ClockSweepTick());

		/*
		 * If the buffer is pinned or has a nonzero usage_count, we cannot use
		 * it; decrement the usage_count (unless pinned) and keep scanning.
		 * 加锁
		 */
		local_buf_state = LockBufHdr(buf);

		if (BUF_STATE_GET_REFCOUNT(local_buf_state) == 0)
		{
			if (BUF_STATE_GET_USAGECOUNT(local_buf_state) != 0)
			{
				local_buf_state -= BUF_USAGECOUNT_ONE;

				trycounter = NBuffers;
			}
			else
			{
				/* Found a usable buffer */
				if (strategy != NULL)
					AddBufferToRing(strategy, buf);
				*buf_state = local_buf_state;
				return buf;
			}
		}
		else if (--trycounter == 0)
		{
			/*
			 * We've scanned all the buffers without making any state changes,
			 * so all the buffers are pinned (or were when we looked at them).
			 * We could hope that someone will free one eventually, but it's
			 * probably better to fail than to risk getting stuck in an
			 * infinite loop.
			 */
			UnlockBufHdr(buf, local_buf_state);
			elog(ERROR, "no unpinned buffers available");
		}
		UnlockBufHdr(buf, local_buf_state);
	}
}

注意不论是步骤1还是步骤2,都有加锁的操作,所以两个进程不可能获取到同一个缓冲区。

并发控制问题

我们再来研究一下BufferAlloc的并发控制,首先我们简化一下BufferAlloc的代码,只留关键框架:

static BufferDesc *
BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
			BlockNumber blockNum,
			BufferAccessStrategy strategy,
			bool *foundPtr)
{
	/*.....省略....*/
	/* 
	 * see if the block is in the buffer pool already 
	 * 步骤1:加锁BufTable。
	 */
	LWLockAcquire(newPartitionLock, LW_SHARED);
    //步骤2:检查物理块是否已经在缓冲区中。
	buf_id = BufTableLookup(&newTag, newHash);
	if (buf_id >= 0)
	{
		/*.....
		 * 物理块已在缓冲区中,直接返回。
		 * ....*/
		return buf;
	}

	/*
	 * Didn't find it in the buffer pool.  We'll have to initialize a new
	 * buffer.  Remember to unlock the mapping lock while doing the work.
	 * 步骤3:解锁BufTable
	 */
	LWLockRelease(newPartitionLock);

	/* 
	 * Loop here in case we have to try another victim buffer 
	 */
	for (;;)
	{
		/*
		 * Ensure, while the spinlock's not yet held, that there's a free
		 * refcount entry.
		 */
		ReservePrivateRefCountEntry();

		/*
		 * Select a victim buffer.  The buffer is returned with its header
		 * spinlock still held!
		 * 步骤4:根据策略从共享缓冲区中获取一个buf。
		 */
		buf = StrategyGetBuffer(strategy, &buf_state);

		Assert(BUF_STATE_GET_REFCOUNT(buf_state) == 0);

		/* Must copy buffer flags while we still hold the spinlock */
		oldFlags = buf_state & BUF_FLAG_MASK;

		/* Pin the buffer and then release the buffer spinlock */
		PinBuffer_Locked(buf);

		/*
		 * 步骤5:如果buf中的数据没有落盘,则需要对数据进行落盘操作。
		 */
		if (oldFlags & BM_DIRTY)
		{
			/***省略**/
		}

		/*
		 * To change the association of a valid buffer, we'll need to have
		 * exclusive lock on both the old and new mapping partitions.
		 * 步骤6:加锁BufTable
		 */
		if (oldFlags & BM_TAG_VALID)
		{
			if (oldPartitionLock < newPartitionLock)
			{
				LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
				LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
			}
			else if (oldPartitionLock > newPartitionLock)
			{
				LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
				LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
			}
			else
			{
				/* only one partition, only one lock */
				LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
			}
		}
		else
		{
			/* if it wasn't valid, we need only the new partition */
			LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
			/* remember we have no old-partition lock or tag */
			oldPartitionLock = NULL;
			/* this just keeps the compiler quiet about uninit variables */
			oldHash = 0;
		}

		/*
		 * Try to make a hashtable entry for the buffer under its new tag.
		 * This could fail because while we were writing someone else
		 * allocated another buffer for the same block we want to read in.
		 * Note that we have not yet removed the hashtable entry for the old
		 * tag.
		 * 步骤7:将buf插入BufTable
		 */
		buf_id = BufTableInsert(&newTag, newHash, buf->buf_id);

		if (buf_id >= 0)
		{
			/***省略,省略的这段代码非常重要,下面将详细解释***/
			return buf;
		}

		/*
		 * Need to lock the buffer header too in order to change its tag.
		 */
		buf_state = LockBufHdr(buf);

		/*
		 * Somebody could have pinned or re-dirtied the buffer while we were
		 * doing the I/O and making the new hashtable entry.  If so, we can't
		 * recycle this buffer; we must undo everything we've done and start
		 * over with a new victim buffer.
		 * 步骤8:在前面的过程中,可能有其他进程对当前buf上锁并做了修改,那么当前buf就不能被使用了
		 * 所以需要重新执行整个流程。
		 */
		oldFlags = buf_state & BUF_FLAG_MASK;
		if (BUF_STATE_GET_REFCOUNT(buf_state) == 1 && !(oldFlags & BM_DIRTY))
			break;

		UnlockBufHdr(buf, buf_state);
		BufTableDelete(&newTag, newHash);
		if (oldPartitionLock != NULL &&
			oldPartitionLock != newPartitionLock)
			LWLockRelease(oldPartitionLock);
		LWLockRelease(newPartitionLock);
		UnpinBuffer(buf, true);
	}

	/*
	 * Okay, it's finally safe to rename the buffer.
	 *
	 * Clearing BM_VALID here is necessary, clearing the dirtybits is just
	 * paranoia.  We also reset the usage_count since any recency of use of
	 * the old content is no longer relevant.  (The usage_count starts out at
	 * 1 so that the buffer can survive one clock-sweep pass.)
	 *
	 * Make sure BM_PERMANENT is set for buffers that must be written at every
	 * checkpoint.  Unlogged buffers only need to be written at shutdown
	 * checkpoints, except for their "init" forks, which need to be treated
	 * just like permanent relations.
	 */
	buf->tag = newTag;
	buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED |
				   BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT |
				   BUF_USAGECOUNT_MASK);
	if (relpersistence == RELPERSISTENCE_PERMANENT || forkNum == INIT_FORKNUM)
		buf_state |= BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE;
	else
		buf_state |= BM_TAG_VALID | BUF_USAGECOUNT_ONE;

	UnlockBufHdr(buf, buf_state);

    //步骤9:将淘汰的块从hash表中删除
	if (oldPartitionLock != NULL)
	{
		BufTableDelete(&oldTag, oldHash);
		if (oldPartitionLock != newPartitionLock)
			LWLockRelease(oldPartitionLock);
	}

    //步骤10:操作完成解锁BufTable
	LWLockRelease(newPartitionLock);

	/*
	 * Buffer contents are currently invalid.  Try to get the io_in_progress
	 * lock.  If StartBufferIO returns false, then someone else managed to
	 * read it before we did, so there's nothing left for BufferAlloc() to do.
	 */
	if (StartBufferIO(buf, true))
		*foundPtr = FALSE;
	else
		*foundPtr = TRUE;

	return buf;
}

下面我们来看看BufferAlloc的流程:

  • 步骤1:加锁BufTable

    后面再来讨论为什么要加锁。

  • 步骤2:检查待加载的数据块(newTag对应的块)是否已经在缓存中。

    如果在就直接返回,否则执行步骤2。

  • 步骤3:解锁BufTable

    后面再来讨论为什么要解锁。

  • 步骤4:根据策略从共享缓冲区中获取一个buf

    这个buf有可能是空闲buf,可能是一个**被淘汰的块(oldTag)**使用的buf。

  • 步骤5:判断从步骤3中获取的buf是否落盘

    步骤3中的buf可能缓存了一个被淘汰的块,且块中的数据尚未落盘,所以步骤4需要进行判断,如果未落盘,则需要落盘。

  • 步骤6:加锁BufTable

    后面再来讨论为什么要加锁。

  • 步骤7:将newTag对应的buf插入BufTable

    如果插入时发现BufTable中已经存在了相同的newTag,则说明有别的进程已经对这个块进行了加载,

  • 步骤8:判断前面获取到的buf是不是又被别人用了

    如果是,则需要放弃这个buf,并将步骤7回退(BufTableDelete(&newTag, newHash)

  • 步骤9:将被淘汰的块(oldTag)从BufTable中删除

  • 步骤10:解锁BufTable

    后面再来讨论为什么要解锁。

下面我们来看看上述流程中对BufTable的几次加锁解锁操作:

  • 步骤1的加锁:由于步骤2需要查询BufTable,所以需要对BufTable加读锁。
  • 步骤3的解锁:这个解锁非常关键,其实从正确性的角度来讲,步骤3完全可以不用解锁。但是接下来的步骤4和步骤5是一个相当耗时的步骤,尤其步骤5可能涉及落盘操作,所以从并发性的角度考虑,必须在步骤4、步骤5之前对BufTable进行解锁。
  • 步骤6的加锁:由于步骤7需要向BufTable写入newTag,所有需要对BufTable加写锁。
  • 步骤10的解锁:整个流程执行完毕所以可以释放BufTable的锁。

上述锁的流程中,最关键的就是步骤3,出于性能考虑的这次解锁操作。这个解锁提高了BufTable的并发性,但是会带来什么问题呢?

从上面的10个步骤中,不难发现我们在步骤4中获取到了一个buf,这个buf中可能加载了一个当前没有进程访问的块(所以被StrategyGetBuffer淘汰出来了),此时这个块对应的oldTag并没有从BufTable中删除,删除oldTag是在步骤9才执行的。此时BufTable上并没有锁,那么在当前进程执行步骤4和步骤5的过程中,这个oldTag对应的块是可以被其他进程访问并修改的。所以当当前进程执行完步骤6和7决定要使用这个buf之前,必须要判断下buf中的块数据有没有被改过、是不是正在被其他进程使用。如果是,则必须放弃这个buf,再重新找一个。

上面的流程看着挺麻烦,那么能不能把步骤9提前呢?比如改成如下顺序:

  • 步骤1:加锁BufTable

  • 步骤2:检查待加载的数据块(newTag对应的块)是否已经在缓存中。

  • 步骤3(原步骤4):根据策略从共享缓冲区中获取一个buf

  • 步骤4(原步骤9):将被淘汰的块(BufTable)从BufTable中删除

  • 步骤5(原步骤3):解锁BufTable

  • 步骤6:判断从步骤3中获取的buf是否落盘

  • 步骤7:加锁BufTable

  • 步骤8:将newTag对应的buf插入BufTable

  • 步骤9:判断前面获取到的buf是不是又被别人用了(不需要这一步了)

  • 步骤10:解锁BufTable

修改之后,我们获取到buf之后就立即将其从BufTable中删除了(步骤3、步骤4),这样后面就不会在有其他进程使用这个块了。但这样会有更严重的问题:

  • 问题1:我们必须在获取buf(调用StrategyGetBuffer)并删除oldTag之后才能解锁BufTable,这降低了BufTable的并发性。(虽然可能比最后来解锁要好一点点)
  • 问题2:如果在步骤8发现当前块已经被别的进程加载了(BufTableInsert返回大于0的buf_id),那么步骤4就白删了!而且不仅是白删,这相当于无故将一个数据块从缓存中驱逐了,后面要用这个块就必须重新加载。

小结

针对BufferAlloc的场景,PostgreSQL的流程是最佳方案。但就从LRU管理的角度来讲,如果对并发性要求不高,且不涉及落盘,修改后的流程也是可以考虑的,毕竟对于源步骤9这种判断buf是不是又被其他进程使用的逻辑比较复杂。

共享缓冲区替换策略

在缓冲池中,初始化定义的缓冲区个数是有限的(由宏NBuffers定义,默认为1000个),并且这个值在初始化分配后将不会再被改变。因此在不断的操作过程中,可能出现缓冲区被用光的局面,这时候就需要替换一些最近未使用的缓冲区,以加载请求的文件块。

PostgreSQL提供两种缓冲区替换策略:一般替换策略缓冲环替换策略。在上述StrategyGetBuffer代码中,缓冲环替换策略在GetBufferFromRing函数中实现,即13行~18行。剩下的代码就是一般替换策略的实现,下面我们分别来阐述这两种策略:

一般替换策略

在前面其实已经讲过一般替换策略的两个步骤,这里再详细描述下

如果有空闲缓冲区,则获取一个空闲缓冲区

首先在缓冲池中维持一个FreeList链表,FreeList是一个单项链表。FreeList中的缓冲区通过其描述符的FreeNext字段链接起来,在BufferStrategyControl结构中记录了FreeList第一个和最后一个元素。当某缓冲区refcount变为0时,将其加入到FreeList链尾,当需要一个空闲缓冲区时,从链首取得。BufferStrategyControl定义如下:

typedef struct
{
	/* Spinlock: protects the values below */
	slock_t		buffer_strategy_lock;

	/*
	 * Clock sweep hand: index of next buffer to consider grabbing. Note that
	 * this isn't a concrete buffer - we only ever increase the value. So, to
	 * get an actual buffer, it needs to be used modulo NBuffers.
	 */
	pg_atomic_uint32 nextVictimBuffer;

	int			firstFreeBuffer;	/* Head of list of unused buffers */
	int			lastFreeBuffer; /* Tail of list of unused buffers */

	/*
	 * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is,
	 * when the list is empty)
	 */

	/*
	 * Statistics.  These counters should be wide enough that they can't
	 * overflow during a single bgwriter cycle.
	 */
	uint32		completePasses; /* Complete cycles of the clock sweep */
	pg_atomic_uint32 numBufferAllocs;	/* Buffers allocated since last reset */

	/*
	 * Bgworker process to be notified upon activity or -1 if none. See
	 * StrategyNotifyBgWriter.
	 */
	int			bgwprocno;
} BufferStrategyControl;
使用替换机制替换缓冲区

替换机制实际是一个简单的clock-sweep算法。主要流程如下:

  1. 初始化tryCounter = NBuffers。
  2. 根据nextVictimBuffer字段找到相应缓冲区,初始值为0。
  3. 将nextVictimBuffer+1,如果当nextVictimBuffer指向池中最后一个缓冲区,设置nextVictimBuffer为0。
  4. 如果步骤2中得到的缓冲区refcount为0:
    a. 若usagecount不为0,则置usagecount减1,并重置trycounter为NBuffers。
    b. 否则获取这个缓冲区并返回。
  5. 如果步骤2中得到的缓冲区的refcount不为0,则将trycounter减1,如果trycounter等于0,报错。
  6. 返回步骤2。

为了看的更清楚,将这部分代码(StrategyGetBuffer125行~167行)再罗列一下,对应上面的步骤添加相应注释:

trycounter = NBuffers;		/* 步骤1 */
for (;;)
{
    /* 步骤2~步骤3 */
	buf = GetBufferDescriptor(ClockSweepTick());
	/*
	 * If the buffer is pinned or has a nonzero usage_count, we cannot use
	 * it; decrement the usage_count (unless pinned) and keep scanning.
	 */
	local_buf_state = LockBufHdr(buf);
    
    /* 步骤4,判断refcount是否为0 */
	if (BUF_STATE_GET_REFCOUNT(local_buf_state) == 0)
	{
        /* 判断usagecount是否为0 */
		if (BUF_STATE_GET_USAGECOUNT(local_buf_state) != 0)
		{
            /* usagecount不为0,则置usagecount减1,并重置trycounter为NBuffers */
			local_buf_state -= BUF_USAGECOUNT_ONE;
			trycounter = NBuffers;
		}
		else
		{
			/* usagecount为0,获取这个缓冲区并返回 */
			if (strategy != NULL)	/* 如果使用了缓冲环策略,则将这个缓冲区添加到缓冲环中 */
				AddBufferToRing(strategy, buf);
			*buf_state = local_buf_state;
			return buf;
		}
	}
	else if (--trycounter == 0)
	{
        /* 步骤5*/
		/*
		 * We've scanned all the buffers without making any state changes,
		 * so all the buffers are pinned (or were when we looked at them).
		 * We could hope that someone will free one eventually, but it's
		 * probably better to fail than to risk getting stuck in an
		 * infinite loop.
		 */
		UnlockBufHdr(buf, local_buf_state);
		elog(ERROR, "no unpinned buffers available");
	}
	UnlockBufHdr(buf, local_buf_state);
    /* 步骤6 继续循环*/
}

核心思想:
不论何种数据库,缓存替换的核心思想都是将访问不频繁的页面交换出去。在PostgreSQL中就通过usage_count来表示一个页面的访问频率,usage_count初始值为0,页面每次执行pin操作都会递增usage_count。所以访问越频繁的页面usage_count就越大,那么在clock-sweep算法中就越不容易变为0,从而越不容易被交换。

缓冲环替换策略

缓冲环是一般替换策略的一种优化,考虑如下场景:假设当前有多个进程在对数据库进行常规操作。此时有一个进程发起了一个全表遍历查询。这个查询会访问大量物理块,但每个块都只访问一次。如果按照一般替换策略,这个全表遍历将导致缓冲池中存在大量只会使用一次的页面,而将许多会被多次使用的页面替换出缓冲区。显然这违背了缓冲区减少I\O的初衷。针对这种情况,缓冲环的基本思想是分配固定数量的缓冲区,替换操作首先在这些缓冲区中进行,如果这些缓冲区中没有可替换的,再使用一般替换策略。环缓冲区主要依靠数据结构BufferAccessStrategy结构来控制,其定义如下:

typedef struct BufferAccessStrategyData
{
	/* Overall strategy type 缓冲环控制策略*/
	BufferAccessStrategyType btype;
	/* Number of elements in buffers[] array 环大小*/
	int			ring_size;

	/*
	 * Index of the "current" slot in the ring, ie, the one most recently
	 * returned by GetBufferFromRing.
	 * 最近加入到环中的Buffer
	 */
	int			current;

	/*
	 * True if the buffer just returned by StrategyGetBuffer had been in the
	 * ring already.
	 * 最近通过StrategyGetBuffer获取的Buffer是否是直接在环中取的
	 */
	bool		current_was_in_ring;

	/*
	 * Array of buffer numbers.  InvalidBuffer (that is, zero) indicates we
	 * have not yet selected a buffer for this ring slot.  For allocation
	 * simplicity this is palloc'd together with the fixed fields of the
	 * struct.
	 * 数组,用于存储加入到环中的缓冲区索引号
	 */
	Buffer		buffers[FLEXIBLE_ARRAY_MEMBER];
}	BufferAccessStrategyData;
typedef struct BufferAccessStrategyData *BufferAccessStrategy;

缓冲环替换策略在GetBufferFromRing函数中实现,该函数有三个步骤:

  1. 将strategy中的current指针指向strategy的Buffers字段的下一个元素(代表可能的下一 个缓冲区),如果当前指向的是Buffers的最后一个元素,则将current置为0 (指向Buffers的第一个 元素)。

  2. 检査current指针指向的元素,如果其中记录的值为InvalidBuffer,表明环还未充满,这个位置还没有记录一个缓冲区。这种情况下设置strategy的current_was_in_ring字段为 false之后返回空值。

    GetBufferFromRing的上层调用函数(StrategyGetBuffer)在检测到返回值为空 之后会采用一般的替换策略取得一个空闲缓冲区,并通过AddBufferToRing将该缓冲区加人到缓冲环中。

  3. 如果current指针指向的元素中记录的是一个有效的缓冲区索引号,则检査该缓冲区的refcount和usagecount。如果refcount为0且usagecount<=1 (最多被访问过一次,而这一次很可能是全表遍历时,当前进程访问的), 则把这个缓冲区替换出来返回;否则表明该缓冲区仍在被其他进程使用中或最近被其他进程使用过,这时需采用和步骤2类似的方法,由上层调用函数采用一般的替换策略取得空闲缓冲区。

上述三步,简而言之就是:获取当前指针的下一个元素对应的缓冲区,若存在一个合法缓冲区,且该缓冲区没有进程在访问,且最近最多被访问过一次,则返回该缓冲区,否则采用一般替换策略。

代码如下:

static BufferDesc *
GetBufferFromRing(BufferAccessStrategy strategy, uint32 *buf_state)
{
	BufferDesc *buf;
	Buffer		bufnum;
	uint32		local_buf_state;	/* to avoid repeated (de-)referencing */


	/* Advance to next ring slot */
	if (++strategy->current >= strategy->ring_size)
		strategy->current = 0;

	/*
	 * If the slot hasn't been filled yet, tell the caller to allocate a new
	 * buffer with the normal allocation strategy.  He will then fill this
	 * slot by calling AddBufferToRing with the new buffer.
	 */
	bufnum = strategy->buffers[strategy->current];
	if (bufnum == InvalidBuffer)
	{
		strategy->current_was_in_ring = false;
		return NULL;
	}

	/*
	 * If the buffer is pinned we cannot use it under any circumstances.
	 *
	 * If usage_count is 0 or 1 then the buffer is fair game (we expect 1,
	 * since our own previous usage of the ring element would have left it
	 * there, but it might've been decremented by clock sweep since then). A
	 * higher usage_count indicates someone else has touched the buffer, so we
	 * shouldn't re-use it.
	 */
	buf = GetBufferDescriptor(bufnum - 1);
	local_buf_state = LockBufHdr(buf);
	if (BUF_STATE_GET_REFCOUNT(local_buf_state) == 0
		&& BUF_STATE_GET_USAGECOUNT(local_buf_state) <= 1)
	{
		strategy->current_was_in_ring = true;
		*buf_state = local_buf_state;
		return buf;
	}
	UnlockBufHdr(buf, local_buf_state);

	/*
	 * Tell caller to allocate a new buffer with the normal allocation
	 * strategy.  He'll then replace this ring element via AddBufferToRing.
	 */
	strategy->current_was_in_ring = false;
	return NULL;
}
AddBufferToRing

前面提到了AddBufferToRing,我们来看看他的实现:

static void
AddBufferToRing(BufferAccessStrategy strategy, BufferDesc *buf)
{
	strategy->buffers[strategy->current] = BufferDescriptorGetBuffer(buf);
}

这个函数非常简单,就是将一个缓冲区在BufferDescriptors中的下标信息存入缓冲环中,current对应的位置。

  • 4
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
回答: 要在项目中整合PostgreSQL和MyBatis-Plus,你需要进行以下几个步骤。 首先,你需要在项目的pom.xml文件中添加MyBatis-Plus和PostgreSQL的依赖项。在依赖项中,你需要添加以下代码段:\[1\] ```xml <!-- mybatis-plus --> <dependency> <groupId>com.baomidou</groupId> <artifactId>mybatis-plus-boot-starter</artifactId> <version>3.2.0</version> </dependency> <!-- postgresql --> <dependency> <groupId>org.postgresql</groupId> <artifactId>postgresql</artifactId> <scope>runtime</scope> </dependency> ``` 接下来,你需要在项目的application.yml文件中进行配置。你需要设置数据库的连接信息,包括URL、用户名和密码。此外,你还需要设置schema的名称。以下是一个示例配置:\[2\] ```yaml spring: datasource: platform: postgres url: jdbc:postgresql://192.188.1.245:5432/uum?currentSchema=uum schemaName: uum username: xxxx password: xxxx driver-class-name: org.postgresql.Driver ``` 最后,你需要在数据库中创建自增字段。在PostgreSQL中,你可以使用sequence来实现自增字段的功能。以下是一个示例的SQL语句:\[3\] ```sql create sequence uum.userid_seq start with 1 increment by 1 no minvalue no maxvalue cache 1; alter sequence uum.userid_seq owner to smartsys; alter table uum.user alter column id set default nextval('uum.userid_seq'); ``` 通过以上步骤,你就可以成功地将PostgreSQL和MyBatis-Plus整合在一起了。你可以使用MyBatis-Plus提供的功能来进行数据库操作。 #### 引用[.reference_title] - *1* [springboot 整合 mybatis plus postgresql](https://blog.csdn.net/weixin_41010294/article/details/105710247)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* *3* [MybatisPlus+Postgresql整合的几个坑](https://blog.csdn.net/xuruilll/article/details/122670781)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值