参考资料
《PostgreSQL数据库内核分析》 彭智勇 彭煜玮:P99~P101
概述
在PostgreSQL中,任何对于表、元组、索引等操作都在缓冲池中进行,缓冲池的数据调度都以磁盘块为单位,需要访问的数据块以磁盘块为单位调用函数smgrread写入缓冲区,而smgrwrite将缓冲池数据写回磁盘。调入缓冲池中的磁盘块称为缓冲区,多个缓冲区组成的缓冲池。
PostgreSQL有两种缓冲池:共享缓冲池和本地缓冲池。共享缓冲池主要作为普通表的操作场所,本地缓冲池则仅本地可见的临时表的操作场所。本文仅对共享缓冲池进行阐述。
对缓冲池中,缓冲区的管理通过两种机制完成:
-
pin
当进程要访问缓冲区前,对于缓冲区加pin,pin的数目保存在缓冲区的refcount属性中。当refcount不为0时表明有进程正在访问缓冲区,此时该缓冲区不能被替换。
-
lock
lock机制为缓冲区的并发访问提供了保障,当进程对缓冲区进行写操作时加EXCLUSIVE锁,读操作加SHARE锁。比如:Insert操作,在获取到缓冲区后需要先将缓冲区加EXCLUSIVE锁。(加锁操作在RelationGetBufferForTuple函数中进行,详见插入流程)。
初始化共享缓冲区
共享缓冲池的初始化工作由InitBufferPool来完成。在共享缓冲池管理中,使用了一个全局数组BufferDescriptors来管理缓冲池中的缓冲区,其数组元素类型为BufferDesc。另外使用了一个全局指针变量BufferBlocks来存储缓冲池的起始地址。
下面先来看看BufferDesc的定义:
typedef struct BufferDesc
{
BufferTag tag; /* ID of page contained in buffer */
int buf_id; /* buffer's index number (from 0) */
/* state of the tag, containing flags, refcount and usagecount */
pg_atomic_uint32 state;
int wait_backend_pid; /* backend PID of pin-count waiter */
int freeNext; /* link in freelist chain */
LWLock content_lock; /* to lock access to buffer contents */
} BufferDesc;
其中:
-
tag:用于标识该缓冲块的物理信息,具体定义如下:
typedef struct buftag { RelFileNode rnode; /* 表所在表空间oid,数据库oid,表本身oid组成 */ ForkNumber forkNum; /* 枚举类型,标记缓冲区中是什么类型的文件块 */ BlockNumber blockNum; /* 块号 */ } BufferTag;
tag唯一标识了一个物理块,注意是物理块!(后面的缓冲区加载流程会再次用到tag)
-
buf_id:缓冲区的索引号,buf_id唯一标识了一个缓冲区。对缓冲区的各种操作都会用到buf_id。
共享缓冲区和本地缓冲区都使用buf_id,他们的编号规则不同:共享缓冲区的buf_id从0开始编号,后续依次加1。而本地缓冲区的buf_id从-2开始编号,后续依次减1。
/* 本地缓冲区从-2开始编号 */ #define LocalBufHdrGetBlock(bufHdr) LocalBufferBlockPointers[-((bufHdr)->buf_id + 2)]
-
state:由flags、refcount、usagecount组成
- flags:标志位,表示缓冲区是否为脏等。
- refcount:表示当前正在引用该块缓冲区的进程数,通过pin操作来修改该字段。
- usagecount:最近缓冲区使用次数,用于缓冲区替换。
-
wait_backend_pid:用于记录一个请求修改缓冲区的进程号。
-
freeNext:如果当前缓冲区在空闲链中,则freeNext指向下一个空闲缓冲区。
-
content_lock:当进程访问缓冲块时,会在content_lock上加锁,读访问加LW_SHARE锁,写访问加LW_EXCLUSIVE锁,此锁可以防止因多个进程对缓冲区访问的冲突而造成数据不一致。
缓冲区的操作
前面说到共享缓冲池管理中有两个全局变量:BufferDesc数组BufferDescriptors和BufferBlocks指针。那么这两个全局变量之间有什么关系,两者由如何转换?
首先,BufferDescriptors是一个数组,数组元素的个数为N。N=缓冲池中缓冲区的数量,默认值为1000。BufferBlocks是一段连续的内存空间,大小为BLCKSZ*N,所以BufferBlocks也可以理解为一个数组,数组元素个数为N,每个数组元素都是一个缓冲区。
在BufferDesc中有一个成员buf_id,这个值表示了当前的BufferDesc在BufferDescriptors中的下标,即
BufferDesc == BufferDescriptors[BufferDesc ->buf_id]。
所以根据buf_id就可以从BufferDescriptors中获取BufferDesc,也可以从BufferBlocks中获取实际的缓冲区。具体操作见如下函数:
/* 返回一个bufferid,后续的操作都是基于bufferid进行 */
#define BufferDescriptorGetBuffer(bdesc) ((bdesc)->buf_id + 1)
/* 从BufferDescriptors中获取一个BufferDesc */
#define GetBufferDescriptor(id) (&BufferDescriptors[(id)].bufferdesc)
/* 从BufferBlocks中获取一个缓冲区 */
#define BufferGetPage(buffer) ((Page)BufferGetBlock(buffer))
#define BufferIsLocal(buffer) ((buffer) < 0) /* 判断是否是本地缓冲区 */
#define BufferGetBlock(buffer) \
( \
AssertMacro(BufferIsValid(buffer)), \
BufferIsLocal(buffer) ? \
LocalBufferBlockPointers[-(buffer) - 1] \
: \
(Block) (BufferBlocks + ((Size) ((buffer) - 1)) * BLCKSZ) \
)
对于GetBufferDescriptor的调用需要的参数直接是BufferDesc的数组下标,但对于BufferGetBlock的调用需要的参数却必须是BufferDescriptorGetBuffer的返回值,即数组下标+1。目前尚不清楚为什么要这样设计。
InitBufferPool的主要功能
InitBufferPool主要做三件事:
-
初始化BufferDescriptors。
-
初始化BufferBlocks。
-
初始化缓冲区hash表。
初始化缓冲区hash表,在StrategyInitialize中调用InitBufTable来完成。缓冲区hash表的作用在共享缓冲区的加载中来讲。
共享缓冲区加载(查询)
当PostgreSQL读写一个物理块时,首先需要把物理块读取到共享缓冲区中,然后再从缓冲区中读写数据。从物理块读取到共享缓冲区的过程称为共享缓冲区加载。ReadBuffer_common是所有缓冲区的通用函数,定义了本地缓冲区和共享缓冲区的通用读取方法。代码如下:
static Buffer
ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
BlockNumber blockNum, ReadBufferMode mode,
BufferAccessStrategy strategy, bool *hit)
{
BufferDesc *bufHdr;
Block bufBlock;
bool found;
bool isExtend;
bool isLocalBuf = SmgrIsTemp(smgr);
*hit = false;
/* Make sure we will have room to remember the buffer pin */
ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
isExtend = (blockNum == P_NEW);
TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum,
smgr->smgr_rnode.node.spcNode,
smgr->smgr_rnode.node.dbNode,
smgr->smgr_rnode.node.relNode,
smgr->smgr_rnode.backend,
isExtend);
/* Substitute proper block number if caller asked for P_NEW */
if (isExtend)
blockNum = smgrnblocks(smgr, forkNum);
if (isLocalBuf)
{
bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found);
if (found)
pgBufferUsage.local_blks_hit++;
else
pgBufferUsage.local_blks_read++;
}
else
{
/*
* lookup the buffer. IO_IN_PROGRESS is set if the requested block is
* not currently in memory.
*/
bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
strategy, &found);
if (found)
pgBufferUsage.shared_blks_hit++;
else
pgBufferUsage.shared_blks_read++;
}
/* At this point we do NOT hold any locks. */
/* if it was already in the buffer pool, we're done */
if (found)
{
if (!isExtend)
{
/* Just need to update stats before we exit */
*hit = true;
VacuumPageHit++;
if (VacuumCostActive)
VacuumCostBalance += VacuumCostPageHit;
TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
smgr->smgr_rnode.node.spcNode,
smgr->smgr_rnode.node.dbNode,
smgr->smgr_rnode.node.relNode,
smgr->smgr_rnode.backend,
isExtend,
found);
/*
* In RBM_ZERO_AND_LOCK mode the caller expects the page to be
* locked on return.
*/
if (!isLocalBuf)
{
if (mode == RBM_ZERO_AND_LOCK)
LWLockAcquire(BufferDescriptorGetContentLock(bufHdr),
LW_EXCLUSIVE);
else if (mode == RBM_ZERO_AND_CLEANUP_LOCK)
LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr));
}
return BufferDescriptorGetBuffer(bufHdr);
}
/*
* We get here only in the corner case where we are trying to extend
* the relation but we found a pre-existing buffer marked BM_VALID.
* This can happen because mdread doesn't complain about reads beyond
* EOF (when zero_damaged_pages is ON) and so a previous attempt to
* read a block beyond EOF could have left a "valid" zero-filled
* buffer. Unfortunately, we have also seen this case occurring
* because of buggy Linux kernels that sometimes return an
* lseek(SEEK_END) result that doesn't account for a recent write. In
* that situation, the pre-existing buffer would contain valid data
* that we don't want to overwrite. Since the legitimate case should
* always have left a zero-filled buffer, complain if not PageIsNew.
*/
bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);
if (!PageIsNew((Page) bufBlock))
ereport(ERROR,
(errmsg("unexpected data beyond EOF in block %u of relation %s",
blockNum, relpath(smgr->smgr_rnode, forkNum)),
errhint("This has been seen to occur with buggy kernels; consider updating your system.")));
/*
* We *must* do smgrextend before succeeding, else the page will not
* be reserved by the kernel, and the next P_NEW call will decide to
* return the same page. Clear the BM_VALID bit, do the StartBufferIO
* call that BufferAlloc didn't, and proceed.
*/
if (isLocalBuf)
{
/* Only need to adjust flags */
uint32 buf_state = pg_atomic_read_u32(&bufHdr->state);
Assert(buf_state & BM_VALID);
buf_state &= ~BM_VALID;
pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
}
else
{
/*
* Loop to handle the very small possibility that someone re-sets
* BM_VALID between our clearing it and StartBufferIO inspecting
* it.
*/
do
{
uint32 buf_state = LockBufHdr(bufHdr);
Assert(buf_state & BM_VALID);
buf_state &= ~BM_VALID;
UnlockBufHdr(bufHdr, buf_state);
} while (!StartBufferIO(bufHdr, true));
}
}
/*
* if we have gotten to this point, we have allocated a buffer for the
* page but its contents are not yet valid. IO_IN_PROGRESS is set for it,
* if it's a shared buffer.
*
* Note: if smgrextend fails, we will end up with a buffer that is
* allocated but not marked BM_VALID. P_NEW will still select the same
* block number (because the relation didn't get any longer on disk) and
* so future attempts to extend the relation will find the same buffer (if
* it's not been recycled) but come right back here to try smgrextend
* again.
*/
Assert(!(pg_atomic_read_u32(&bufHdr->state) & BM_VALID)); /* spinlock not needed */
bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);
if (isExtend)
{
/* new buffers are zero-filled */
MemSet((char *) bufBlock, 0, BLCKSZ);
/* don't set checksum for all-zero page */
smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);
/*
* NB: we're *not* doing a ScheduleBufferTagForWriteback here;
* although we're essentially performing a write. At least on linux
* doing so defeats the 'delayed allocation' mechanism, leading to
* increased file fragmentation.
*/
}
else
{
/*
* Read in the page, unless the caller intends to overwrite it and
* just wants us to allocate a buffer.
*/
if (mode == RBM_ZERO_AND_LOCK || mode == RBM_ZERO_AND_CLEANUP_LOCK)
MemSet((char *) bufBlock, 0, BLCKSZ);
else
{
instr_time io_start,
io_time;
if (track_io_timing)
INSTR_TIME_SET_CURRENT(io_start);
smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
if (track_io_timing)
{
INSTR_TIME_SET_CURRENT(io_time);
INSTR_TIME_SUBTRACT(io_time, io_start);
pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time));
INSTR_TIME_ADD(pgBufferUsage.blk_read_time, io_time);
}
/* check for garbage data */
if (!PageIsVerified((Page) bufBlock, blockNum))
{
if (mode == RBM_ZERO_ON_ERROR || zero_damaged_pages)
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg("invalid page in block %u of relation %s; zeroing out page",
blockNum,
relpath(smgr->smgr_rnode, forkNum))));
MemSet((char *) bufBlock, 0, BLCKSZ);
}
else
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg("invalid page in block %u of relation %s",
blockNum,
relpath(smgr->smgr_rnode, forkNum))));
}
}
}
/*
* In RBM_ZERO_AND_LOCK mode, grab the buffer content lock before marking
* the page as valid, to make sure that no other backend sees the zeroed
* page before the caller has had a chance to initialize it.
*
* Since no-one else can be looking at the page contents yet, there is no
* difference between an exclusive lock and a cleanup-strength lock. (Note
* that we cannot use LockBuffer() or LockBufferForCleanup() here, because
* they assert that the buffer is already valid.)
*/
if ((mode == RBM_ZERO_AND_LOCK || mode == RBM_ZERO_AND_CLEANUP_LOCK) &&
!isLocalBuf)
{
LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_EXCLUSIVE);
}
if (isLocalBuf)
{
/* Only need to adjust flags */
uint32 buf_state = pg_atomic_read_u32(&bufHdr->state);
buf_state |= BM_VALID;
pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
}
else
{
/* Set BM_VALID, terminate IO, and wake up any waiters */
TerminateBufferIO(bufHdr, false, BM_VALID);
}
VacuumPageMiss++;
if (VacuumCostActive)
VacuumCostBalance += VacuumCostPageMiss;
TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
smgr->smgr_rnode.node.spcNode,
smgr->smgr_rnode.node.dbNode,
smgr->smgr_rnode.node.relNode,
smgr->smgr_rnode.backend,
isExtend,
found);
return BufferDescriptorGetBuffer(bufHdr);
}
代码较长,但就加载而言,只有两个步骤:
-
步骤1:调用
BufferAlloc
从从共享缓冲区中获取一个buf。该buf中可能已经缓存了当前需要的块,此时直接返回即可。
BufferAlloc的出参found,表示buf中是否缓存了当前块。
-
步骤2:如果buf中没有缓存当前块,则需要调用smgrread将当前块从磁盘中读取到buf中。
不难看出,BufferAlloc是整个加载过程的核心,在查看BufferAlloc代码之前,我们带一个着问题来调试BufferAlloc:如果两个进程需要同时加载同一个物理块,那么如何保证这个块不会被重复加载?
块的重复加载问题
为了解决这个问题,我们先设计如下的测试步骤:
-
创建一张表。
create table t1(a int);
-
向表中插入一条记录,此时该表就会包含一个物理块。
insert into t1 values(1);
-
重启数据库,如此步骤2产生的物理块就不会存在于共享缓冲池中。
-
在BufferAlloc中打上断点。
-
开启两个客户端连接PostgreSQL,然后执行查询语句。
select * from t1;
还记得InitBufferPool中初始化的hash表么,下面它将隆重登场,hash在这里相当于一个缓冲区字典,以物理块的BufferTag为key,以缓冲区的buf_id为value。BufferAlloc按照以下步骤执行:
- 将物理块对应表的表空间oid、数据库oid、本身oid等信息组成BufferTag(见:INIT_BUFFERTAG)。前面说过BufferTag唯一标识一个物理块。那么就可以以BufferTag为key在hash表中进行查询,若能够查询到相应的buf_id,则说明请求的物理块已经被加载到缓冲池中,那么直接返回(以BufferDesc的形式返回)。
- 当hash表中不存在时,则需要在找到一个空闲的缓冲区来装入文件。如果存在空闲缓冲区则返回该缓冲区,如果不存在则使用替换机制进行替换缓冲区。
BufferAlloc的代码如下:
static BufferDesc *
BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
BlockNumber blockNum,
BufferAccessStrategy strategy,
bool *foundPtr)
{
/***省略***/
/*
* see if the block is in the buffer pool already
* 步骤1:检查物理块是否已经在缓冲区中。
*/
LWLockAcquire(newPartitionLock, LW_SHARED);
buf_id = BufTableLookup(&newTag, newHash);
if (buf_id >= 0)
{
//找到则直接返回
/***省略***/
return buf;
}
/***省略***/
/*
* Loop here in case we have to try another victim buffer
*/
for (;;)
{
/*
* Ensure, while the spinlock's not yet held, that there's a free
* refcount entry.
*/
ReservePrivateRefCountEntry();
/*
* Select a victim buffer. The buffer is returned with its header
* spinlock still held!
* 步骤2:获取一个空闲缓冲区。
*/
buf = StrategyGetBuffer(strategy, &buf_state);
/***省略***/
/*
* To change the association of a valid buffer, we'll need to have
* exclusive lock on both the old and new mapping partitions.
*/
if (oldFlags & BM_TAG_VALID)
{
/*
* Need to compute the old tag's hashcode and partition lock ID.
* XXX is it worth storing the hashcode in BufferDesc so we need
* not recompute it here? Probably not.
*/
oldTag = buf->tag;
oldHash = BufTableHashCode(&oldTag);
oldPartitionLock = BufMappingPartitionLock(oldHash);
/*
* Must lock the lower-numbered partition first to avoid
* deadlocks.
*/
if (oldPartitionLock < newPartitionLock)
{
LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
}
else if (oldPartitionLock > newPartitionLock)
{
LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
}
else
{
/* only one partition, only one lock */
LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
}
}
else
{
/* if it wasn't valid, we need only the new partition */
LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
/* remember we have no old-partition lock or tag */
oldPartitionLock = NULL;
/* this just keeps the compiler quiet about uninit variables */
oldHash = 0;
}
/*
* Try to make a hashtable entry for the buffer under its new tag.
* This could fail because while we were writing someone else
* allocated another buffer for the same block we want to read in.
* Note that we have not yet removed the hashtable entry for the old
* tag.
* 步骤3:将newTag插入BufTable。
*/
buf_id = BufTableInsert(&newTag, newHash, buf->buf_id);
if (buf_id >= 0)
{
/*
* Got a collision. Someone has already done what we were about to
* do. We'll just handle this as if it were found in the buffer
* pool in the first place. First, give up the buffer we were
* planning to use.
*
* 放弃当前获取到的buf
*/
UnpinBuffer(buf, true);
/* Can give up that buffer's mapping partition lock now */
if (oldPartitionLock != NULL &&
oldPartitionLock != newPartitionLock)
LWLockRelease(oldPartitionLock);
/* remaining code should match code at top of routine */
buf = GetBufferDescriptor(buf_id);
/***pin buf_id对应的buf***/
valid = PinBuffer(buf, strategy);
/* Can release the mapping lock as soon as we've pinned it */
LWLockRelease(newPartitionLock);
*foundPtr = TRUE;
if (!valid)
{
/*
* We can only get here if (a) someone else is still reading
* in the page, or (b) a previous read attempt failed. We
* have to wait for any active read attempt to finish, and
* then set up our own read attempt if the page is still not
* BM_VALID. StartBufferIO does it all.
*/
if (StartBufferIO(buf, true))
{
/*
* If we get here, previous attempts to read the buffer
* must have failed ... but we shall bravely try again.
*/
*foundPtr = FALSE;
}
}
return buf;
}
/*
* Need to lock the buffer header too in order to change its tag.
*/
buf_state = LockBufHdr(buf);
/***省略***/
}
/***省略***/
return buf;
}
现在我们回到之前提出的问题:如果两个进程需要同时加载同一个物理块,那么如何保证这个块不会被重复加载?在调试的过程中,我们发现由于数据库重新启动,所以物理块肯定不会被加载到缓冲池中,所以步骤1的BufTableLookup返回值为-1,于是进入到了步骤2。并且此时两个进程都获取到了一个缓冲区!紧接着执行步骤3,调用函数BufTableInsert
将获取到的buf进程插入hash表中(以BufferTag为key,buf_id为value)。但在插入hash表之前,首先对hash表加了互斥锁(上面代码47行~86行),于是两个进程变为了串行!
接着进程1执行BufTableInsert,BufTableInsert会返回一个buf_id,由于在插入前hash表中没有相应的BufferTag,所以返回-1。当进程2执行BufTableInsert时,由于BufferTag已经被进程1插入到了hash表中,所以显然BufTableInsert会返回BufferTag对应的buf_id。此时进程2会放弃从StrategyGetBuffer
中获取的buf(上面代码第108行),转为获取buf_id对应的buf(上面代码第120行)。
由此可见,通过对hash表的串行插入,防止了同一个物理块被重复加载的问题。
加载中间态问题
通过前面对ReadBuffer_common
函数的描述,我们明白了一件事,就加载而言ReadBuffer_common
有两个步骤:
- 步骤1:调用
BufferAlloc
获取一个buf - 步骤2:调用
smgrread
将物理块读取到buf中
在重复加载的实验中,进程1就执行了上述两个步骤。而此时和进程1并发执行的进程2就会遇到一个问题:进程1在执行步骤1之后,hash表中就已经存在当前块对应的BufferTag了,而进程2也能看到这个BufferTag。但此时进程1可能尚未执行步骤2,或者正在执行步骤2,又或者步骤2执行失败,无论那种情况物理块都尚未读取到buf中,那么如果进程2直接使用这个buf显然会出问题。
所以在上述代码的120行,valid = PinBuffer(buf, strategy);
Pin操作返回了一个valid,如果valid为false,则表示当前块尚未加载到缓存中,于是会调用StartBufferIO
等待加载完毕(上面代码136行)。
- 如果进程1加载成功,那么进程2的
StartBufferIO
会返回false。此时进程2的BufferAlloc
会返回buf,同时出参found为ture,表示buf中已经加载了需要的块,无需再加载。 - 如果进程1加载失败,那么进程2的
StartBufferIO
会返回true。此时进程2的BufferAlloc
会返回buf,同时出参found为false,表示buf中没有加载了需要的块,需要smgrread
将物理块读取到buf中。
缓冲区获取冲突问题
我们再来思考另外一个问题:如果两个进程需要加载不同的物理块,但是获取到了同一个缓冲区怎么办?获取缓冲区的函数为StrategyGetBuffer,该函数按如下步骤执行:
- 如果有空闲缓冲区,则获取一个空闲缓冲区。否则执行步骤2。
- 使用替换机制替换缓冲区。
代码如下:
BufferDesc *
StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
{
BufferDesc *buf;
int bgwprocno;
int trycounter;
uint32 local_buf_state; /* to avoid repeated (de-)referencing */
/*
* If given a strategy object, see whether it can select a buffer. We
* assume strategy objects don't need buffer_strategy_lock.
*/
if (strategy != NULL)
{
buf = GetBufferFromRing(strategy, buf_state);
if (buf != NULL)
return buf;
}
/*
* If asked, we need to waken the bgwriter. Since we don't want to rely on
* a spinlock for this we force a read from shared memory once, and then
* set the latch based on that value. We need to go through that length
* because otherwise bgprocno might be reset while/after we check because
* the compiler might just reread from memory.
*
* This can possibly set the latch of the wrong process if the bgwriter
* dies in the wrong moment. But since PGPROC->procLatch is never
* deallocated the worst consequence of that is that we set the latch of
* some arbitrary process.
*/
bgwprocno = INT_ACCESS_ONCE(StrategyControl->bgwprocno);
if (bgwprocno != -1)
{
/* reset bgwprocno first, before setting the latch */
StrategyControl->bgwprocno = -1;
/*
* Not acquiring ProcArrayLock here which is slightly icky. It's
* actually fine because procLatch isn't ever freed, so we just can
* potentially set the wrong process' (or no process') latch.
*/
SetLatch(&ProcGlobal->allProcs[bgwprocno].procLatch);
}
/*
* We count buffer allocation requests so that the bgwriter can estimate
* the rate of buffer consumption. Note that buffers recycled by a
* strategy object are intentionally not counted here.
*/
pg_atomic_fetch_add_u32(&StrategyControl->numBufferAllocs, 1);
/*
* First check, without acquiring the lock, whether there's buffers in the
* freelist. Since we otherwise don't require the spinlock in every
* StrategyGetBuffer() invocation, it'd be sad to acquire it here -
* uselessly in most cases. That obviously leaves a race where a buffer is
* put on the freelist but we don't see the store yet - but that's pretty
* harmless, it'll just get used during the next buffer acquisition.
*
* If there's buffers on the freelist, acquire the spinlock to pop one
* buffer of the freelist. Then check whether that buffer is usable and
* repeat if not.
*
* Note that the freeNext fields are considered to be protected by the
* buffer_strategy_lock not the individual buffer spinlocks, so it's OK to
* manipulate them without holding the spinlock.
*
* 步骤1:获取空闲缓冲区
*
*/
if (StrategyControl->firstFreeBuffer >= 0)
{
while (true)
{
/*
* Acquire the spinlock to remove element from the freelist
* 加锁
*/
SpinLockAcquire(&StrategyControl->buffer_strategy_lock);
if (StrategyControl->firstFreeBuffer < 0)
{
SpinLockRelease(&StrategyControl->buffer_strategy_lock);
break;
}
buf = GetBufferDescriptor(StrategyControl->firstFreeBuffer);
Assert(buf->freeNext != FREENEXT_NOT_IN_LIST);
/* Unconditionally remove buffer from freelist */
StrategyControl->firstFreeBuffer = buf->freeNext;
buf->freeNext = FREENEXT_NOT_IN_LIST;
/*
* Release the lock so someone else can access the freelist while
* we check out this buffer.
*/
SpinLockRelease(&StrategyControl->buffer_strategy_lock);
/*
* If the buffer is pinned or has a nonzero usage_count, we cannot
* use it; discard it and retry. (This can only happen if VACUUM
* put a valid buffer in the freelist and then someone else used
* it before we got to it. It's probably impossible altogether as
* of 8.3, but we'd better check anyway.)
*/
local_buf_state = LockBufHdr(buf);
if (BUF_STATE_GET_REFCOUNT(local_buf_state) == 0
&& BUF_STATE_GET_USAGECOUNT(local_buf_state) == 0)
{
if (strategy != NULL)
AddBufferToRing(strategy, buf);
*buf_state = local_buf_state;
return buf;
}
UnlockBufHdr(buf, local_buf_state);
}
}
/* Nothing on the freelist, so run the "clock sweep" algorithm
* 步骤2:使用替换机制替换缓冲区
*/
trycounter = NBuffers;
for (;;)
{
buf = GetBufferDescriptor(ClockSweepTick());
/*
* If the buffer is pinned or has a nonzero usage_count, we cannot use
* it; decrement the usage_count (unless pinned) and keep scanning.
* 加锁
*/
local_buf_state = LockBufHdr(buf);
if (BUF_STATE_GET_REFCOUNT(local_buf_state) == 0)
{
if (BUF_STATE_GET_USAGECOUNT(local_buf_state) != 0)
{
local_buf_state -= BUF_USAGECOUNT_ONE;
trycounter = NBuffers;
}
else
{
/* Found a usable buffer */
if (strategy != NULL)
AddBufferToRing(strategy, buf);
*buf_state = local_buf_state;
return buf;
}
}
else if (--trycounter == 0)
{
/*
* We've scanned all the buffers without making any state changes,
* so all the buffers are pinned (or were when we looked at them).
* We could hope that someone will free one eventually, but it's
* probably better to fail than to risk getting stuck in an
* infinite loop.
*/
UnlockBufHdr(buf, local_buf_state);
elog(ERROR, "no unpinned buffers available");
}
UnlockBufHdr(buf, local_buf_state);
}
}
注意不论是步骤1还是步骤2,都有加锁的操作,所以两个进程不可能获取到同一个缓冲区。
并发控制问题
我们再来研究一下BufferAlloc的并发控制,首先我们简化一下BufferAlloc的代码,只留关键框架:
static BufferDesc *
BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
BlockNumber blockNum,
BufferAccessStrategy strategy,
bool *foundPtr)
{
/*.....省略....*/
/*
* see if the block is in the buffer pool already
* 步骤1:加锁BufTable。
*/
LWLockAcquire(newPartitionLock, LW_SHARED);
//步骤2:检查物理块是否已经在缓冲区中。
buf_id = BufTableLookup(&newTag, newHash);
if (buf_id >= 0)
{
/*.....
* 物理块已在缓冲区中,直接返回。
* ....*/
return buf;
}
/*
* Didn't find it in the buffer pool. We'll have to initialize a new
* buffer. Remember to unlock the mapping lock while doing the work.
* 步骤3:解锁BufTable
*/
LWLockRelease(newPartitionLock);
/*
* Loop here in case we have to try another victim buffer
*/
for (;;)
{
/*
* Ensure, while the spinlock's not yet held, that there's a free
* refcount entry.
*/
ReservePrivateRefCountEntry();
/*
* Select a victim buffer. The buffer is returned with its header
* spinlock still held!
* 步骤4:根据策略从共享缓冲区中获取一个buf。
*/
buf = StrategyGetBuffer(strategy, &buf_state);
Assert(BUF_STATE_GET_REFCOUNT(buf_state) == 0);
/* Must copy buffer flags while we still hold the spinlock */
oldFlags = buf_state & BUF_FLAG_MASK;
/* Pin the buffer and then release the buffer spinlock */
PinBuffer_Locked(buf);
/*
* 步骤5:如果buf中的数据没有落盘,则需要对数据进行落盘操作。
*/
if (oldFlags & BM_DIRTY)
{
/***省略**/
}
/*
* To change the association of a valid buffer, we'll need to have
* exclusive lock on both the old and new mapping partitions.
* 步骤6:加锁BufTable
*/
if (oldFlags & BM_TAG_VALID)
{
if (oldPartitionLock < newPartitionLock)
{
LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
}
else if (oldPartitionLock > newPartitionLock)
{
LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
}
else
{
/* only one partition, only one lock */
LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
}
}
else
{
/* if it wasn't valid, we need only the new partition */
LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
/* remember we have no old-partition lock or tag */
oldPartitionLock = NULL;
/* this just keeps the compiler quiet about uninit variables */
oldHash = 0;
}
/*
* Try to make a hashtable entry for the buffer under its new tag.
* This could fail because while we were writing someone else
* allocated another buffer for the same block we want to read in.
* Note that we have not yet removed the hashtable entry for the old
* tag.
* 步骤7:将buf插入BufTable
*/
buf_id = BufTableInsert(&newTag, newHash, buf->buf_id);
if (buf_id >= 0)
{
/***省略,省略的这段代码非常重要,下面将详细解释***/
return buf;
}
/*
* Need to lock the buffer header too in order to change its tag.
*/
buf_state = LockBufHdr(buf);
/*
* Somebody could have pinned or re-dirtied the buffer while we were
* doing the I/O and making the new hashtable entry. If so, we can't
* recycle this buffer; we must undo everything we've done and start
* over with a new victim buffer.
* 步骤8:在前面的过程中,可能有其他进程对当前buf上锁并做了修改,那么当前buf就不能被使用了
* 所以需要重新执行整个流程。
*/
oldFlags = buf_state & BUF_FLAG_MASK;
if (BUF_STATE_GET_REFCOUNT(buf_state) == 1 && !(oldFlags & BM_DIRTY))
break;
UnlockBufHdr(buf, buf_state);
BufTableDelete(&newTag, newHash);
if (oldPartitionLock != NULL &&
oldPartitionLock != newPartitionLock)
LWLockRelease(oldPartitionLock);
LWLockRelease(newPartitionLock);
UnpinBuffer(buf, true);
}
/*
* Okay, it's finally safe to rename the buffer.
*
* Clearing BM_VALID here is necessary, clearing the dirtybits is just
* paranoia. We also reset the usage_count since any recency of use of
* the old content is no longer relevant. (The usage_count starts out at
* 1 so that the buffer can survive one clock-sweep pass.)
*
* Make sure BM_PERMANENT is set for buffers that must be written at every
* checkpoint. Unlogged buffers only need to be written at shutdown
* checkpoints, except for their "init" forks, which need to be treated
* just like permanent relations.
*/
buf->tag = newTag;
buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED |
BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT |
BUF_USAGECOUNT_MASK);
if (relpersistence == RELPERSISTENCE_PERMANENT || forkNum == INIT_FORKNUM)
buf_state |= BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE;
else
buf_state |= BM_TAG_VALID | BUF_USAGECOUNT_ONE;
UnlockBufHdr(buf, buf_state);
//步骤9:将淘汰的块从hash表中删除
if (oldPartitionLock != NULL)
{
BufTableDelete(&oldTag, oldHash);
if (oldPartitionLock != newPartitionLock)
LWLockRelease(oldPartitionLock);
}
//步骤10:操作完成解锁BufTable
LWLockRelease(newPartitionLock);
/*
* Buffer contents are currently invalid. Try to get the io_in_progress
* lock. If StartBufferIO returns false, then someone else managed to
* read it before we did, so there's nothing left for BufferAlloc() to do.
*/
if (StartBufferIO(buf, true))
*foundPtr = FALSE;
else
*foundPtr = TRUE;
return buf;
}
下面我们来看看BufferAlloc的流程:
-
步骤1:加锁BufTable
后面再来讨论为什么要加锁。
-
步骤2:检查待加载的数据块(newTag对应的块)是否已经在缓存中。
如果在就直接返回,否则执行步骤2。
-
步骤3:解锁BufTable
后面再来讨论为什么要解锁。
-
步骤4:根据策略从共享缓冲区中获取一个buf
这个buf有可能是空闲buf,可能是一个**被淘汰的块(oldTag)**使用的buf。
-
步骤5:判断从步骤3中获取的buf是否落盘
步骤3中的buf可能缓存了一个被淘汰的块,且块中的数据尚未落盘,所以步骤4需要进行判断,如果未落盘,则需要落盘。
-
步骤6:加锁BufTable
后面再来讨论为什么要加锁。
-
步骤7:将newTag对应的buf插入BufTable
如果插入时发现BufTable中已经存在了相同的newTag,则说明有别的进程已经对这个块进行了加载,
-
步骤8:判断前面获取到的buf是不是又被别人用了
如果是,则需要放弃这个buf,并将步骤7回退(
BufTableDelete(&newTag, newHash)
) -
步骤9:将被淘汰的块(oldTag)从BufTable中删除
-
步骤10:解锁BufTable
后面再来讨论为什么要解锁。
下面我们来看看上述流程中对BufTable的几次加锁解锁操作:
- 步骤1的加锁:由于步骤2需要查询BufTable,所以需要对BufTable加读锁。
- 步骤3的解锁:这个解锁非常关键,其实从正确性的角度来讲,步骤3完全可以不用解锁。但是接下来的步骤4和步骤5是一个相当耗时的步骤,尤其步骤5可能涉及落盘操作,所以从并发性的角度考虑,必须在步骤4、步骤5之前对BufTable进行解锁。
- 步骤6的加锁:由于步骤7需要向BufTable写入newTag,所有需要对BufTable加写锁。
- 步骤10的解锁:整个流程执行完毕所以可以释放BufTable的锁。
上述锁的流程中,最关键的就是步骤3,出于性能考虑的这次解锁操作。这个解锁提高了BufTable的并发性,但是会带来什么问题呢?
从上面的10个步骤中,不难发现我们在步骤4中获取到了一个buf,这个buf中可能加载了一个当前没有进程访问的块(所以被StrategyGetBuffer
淘汰出来了),此时这个块对应的oldTag并没有从BufTable中删除,删除oldTag是在步骤9才执行的。此时BufTable上并没有锁,那么在当前进程执行步骤4和步骤5的过程中,这个oldTag对应的块是可以被其他进程访问并修改的。所以当当前进程执行完步骤6和7决定要使用这个buf之前,必须要判断下buf中的块数据有没有被改过、是不是正在被其他进程使用。如果是,则必须放弃这个buf,再重新找一个。
上面的流程看着挺麻烦,那么能不能把步骤9提前呢?比如改成如下顺序:
-
步骤1:加锁BufTable
-
步骤2:检查待加载的数据块(newTag对应的块)是否已经在缓存中。
-
步骤3(原步骤4):根据策略从共享缓冲区中获取一个buf
-
步骤4(原步骤9):将被淘汰的块(BufTable)从BufTable中删除
-
步骤5(原步骤3):解锁BufTable
-
步骤6:判断从步骤3中获取的buf是否落盘
-
步骤7:加锁BufTable
-
步骤8:将newTag对应的buf插入BufTable
-
步骤9:判断前面获取到的buf是不是又被别人用了(不需要这一步了) -
步骤10:解锁BufTable
修改之后,我们获取到buf之后就立即将其从BufTable中删除了(步骤3、步骤4),这样后面就不会在有其他进程使用这个块了。但这样会有更严重的问题:
- 问题1:我们必须在获取buf(调用
StrategyGetBuffer
)并删除oldTag之后才能解锁BufTable,这降低了BufTable的并发性。(虽然可能比最后来解锁要好一点点) - 问题2:如果在步骤8发现当前块已经被别的进程加载了(
BufTableInsert
返回大于0的buf_id),那么步骤4就白删了!而且不仅是白删,这相当于无故将一个数据块从缓存中驱逐了,后面要用这个块就必须重新加载。
小结
针对BufferAlloc的场景,PostgreSQL的流程是最佳方案。但就从LRU管理的角度来讲,如果对并发性要求不高,且不涉及落盘,修改后的流程也是可以考虑的,毕竟对于源步骤9这种判断buf是不是又被其他进程使用的逻辑比较复杂。
共享缓冲区替换策略
在缓冲池中,初始化定义的缓冲区个数是有限的(由宏NBuffers定义,默认为1000个),并且这个值在初始化分配后将不会再被改变。因此在不断的操作过程中,可能出现缓冲区被用光的局面,这时候就需要替换一些最近未使用的缓冲区,以加载请求的文件块。
PostgreSQL提供两种缓冲区替换策略:一般替换策略和缓冲环替换策略。在上述StrategyGetBuffer代码中,缓冲环替换策略在GetBufferFromRing函数中实现,即13行~18行。剩下的代码就是一般替换策略的实现,下面我们分别来阐述这两种策略:
一般替换策略
在前面其实已经讲过一般替换策略的两个步骤,这里再详细描述下
如果有空闲缓冲区,则获取一个空闲缓冲区
首先在缓冲池中维持一个FreeList链表,FreeList是一个单项链表。FreeList中的缓冲区通过其描述符的FreeNext字段链接起来,在BufferStrategyControl结构中记录了FreeList第一个和最后一个元素。当某缓冲区refcount变为0时,将其加入到FreeList链尾,当需要一个空闲缓冲区时,从链首取得。BufferStrategyControl定义如下:
typedef struct
{
/* Spinlock: protects the values below */
slock_t buffer_strategy_lock;
/*
* Clock sweep hand: index of next buffer to consider grabbing. Note that
* this isn't a concrete buffer - we only ever increase the value. So, to
* get an actual buffer, it needs to be used modulo NBuffers.
*/
pg_atomic_uint32 nextVictimBuffer;
int firstFreeBuffer; /* Head of list of unused buffers */
int lastFreeBuffer; /* Tail of list of unused buffers */
/*
* NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is,
* when the list is empty)
*/
/*
* Statistics. These counters should be wide enough that they can't
* overflow during a single bgwriter cycle.
*/
uint32 completePasses; /* Complete cycles of the clock sweep */
pg_atomic_uint32 numBufferAllocs; /* Buffers allocated since last reset */
/*
* Bgworker process to be notified upon activity or -1 if none. See
* StrategyNotifyBgWriter.
*/
int bgwprocno;
} BufferStrategyControl;
使用替换机制替换缓冲区
替换机制实际是一个简单的clock-sweep算法。主要流程如下:
- 初始化tryCounter = NBuffers。
- 根据nextVictimBuffer字段找到相应缓冲区,初始值为0。
- 将nextVictimBuffer+1,如果当nextVictimBuffer指向池中最后一个缓冲区,设置nextVictimBuffer为0。
- 如果步骤2中得到的缓冲区refcount为0:
a. 若usagecount不为0,则置usagecount减1,并重置trycounter为NBuffers。
b. 否则获取这个缓冲区并返回。 - 如果步骤2中得到的缓冲区的refcount不为0,则将trycounter减1,如果trycounter等于0,报错。
- 返回步骤2。
为了看的更清楚,将这部分代码(StrategyGetBuffer125行~167行)再罗列一下,对应上面的步骤添加相应注释:
trycounter = NBuffers; /* 步骤1 */
for (;;)
{
/* 步骤2~步骤3 */
buf = GetBufferDescriptor(ClockSweepTick());
/*
* If the buffer is pinned or has a nonzero usage_count, we cannot use
* it; decrement the usage_count (unless pinned) and keep scanning.
*/
local_buf_state = LockBufHdr(buf);
/* 步骤4,判断refcount是否为0 */
if (BUF_STATE_GET_REFCOUNT(local_buf_state) == 0)
{
/* 判断usagecount是否为0 */
if (BUF_STATE_GET_USAGECOUNT(local_buf_state) != 0)
{
/* usagecount不为0,则置usagecount减1,并重置trycounter为NBuffers */
local_buf_state -= BUF_USAGECOUNT_ONE;
trycounter = NBuffers;
}
else
{
/* usagecount为0,获取这个缓冲区并返回 */
if (strategy != NULL) /* 如果使用了缓冲环策略,则将这个缓冲区添加到缓冲环中 */
AddBufferToRing(strategy, buf);
*buf_state = local_buf_state;
return buf;
}
}
else if (--trycounter == 0)
{
/* 步骤5*/
/*
* We've scanned all the buffers without making any state changes,
* so all the buffers are pinned (or were when we looked at them).
* We could hope that someone will free one eventually, but it's
* probably better to fail than to risk getting stuck in an
* infinite loop.
*/
UnlockBufHdr(buf, local_buf_state);
elog(ERROR, "no unpinned buffers available");
}
UnlockBufHdr(buf, local_buf_state);
/* 步骤6 继续循环*/
}
核心思想:
不论何种数据库,缓存替换的核心思想都是将访问不频繁的页面交换出去。在PostgreSQL中就通过usage_count来表示一个页面的访问频率,usage_count初始值为0,页面每次执行pin操作都会递增usage_count。所以访问越频繁的页面usage_count就越大,那么在clock-sweep算法中就越不容易变为0,从而越不容易被交换。
缓冲环替换策略
缓冲环是一般替换策略的一种优化,考虑如下场景:假设当前有多个进程在对数据库进行常规操作。此时有一个进程发起了一个全表遍历查询。这个查询会访问大量物理块,但每个块都只访问一次。如果按照一般替换策略,这个全表遍历将导致缓冲池中存在大量只会使用一次的页面,而将许多会被多次使用的页面替换出缓冲区。显然这违背了缓冲区减少I\O的初衷。针对这种情况,缓冲环的基本思想是分配固定数量的缓冲区,替换操作首先在这些缓冲区中进行,如果这些缓冲区中没有可替换的,再使用一般替换策略。环缓冲区主要依靠数据结构BufferAccessStrategy结构来控制,其定义如下:
typedef struct BufferAccessStrategyData
{
/* Overall strategy type 缓冲环控制策略*/
BufferAccessStrategyType btype;
/* Number of elements in buffers[] array 环大小*/
int ring_size;
/*
* Index of the "current" slot in the ring, ie, the one most recently
* returned by GetBufferFromRing.
* 最近加入到环中的Buffer
*/
int current;
/*
* True if the buffer just returned by StrategyGetBuffer had been in the
* ring already.
* 最近通过StrategyGetBuffer获取的Buffer是否是直接在环中取的
*/
bool current_was_in_ring;
/*
* Array of buffer numbers. InvalidBuffer (that is, zero) indicates we
* have not yet selected a buffer for this ring slot. For allocation
* simplicity this is palloc'd together with the fixed fields of the
* struct.
* 数组,用于存储加入到环中的缓冲区索引号
*/
Buffer buffers[FLEXIBLE_ARRAY_MEMBER];
} BufferAccessStrategyData;
typedef struct BufferAccessStrategyData *BufferAccessStrategy;
缓冲环替换策略在GetBufferFromRing函数中实现,该函数有三个步骤:
-
将strategy中的current指针指向strategy的Buffers字段的下一个元素(代表可能的下一 个缓冲区),如果当前指向的是Buffers的最后一个元素,则将current置为0 (指向Buffers的第一个 元素)。
-
检査current指针指向的元素,如果其中记录的值为InvalidBuffer,表明环还未充满,这个位置还没有记录一个缓冲区。这种情况下设置strategy的current_was_in_ring字段为 false之后返回空值。
GetBufferFromRing的上层调用函数(StrategyGetBuffer)在检测到返回值为空 之后会采用一般的替换策略取得一个空闲缓冲区,并通过AddBufferToRing将该缓冲区加人到缓冲环中。
-
如果current指针指向的元素中记录的是一个有效的缓冲区索引号,则检査该缓冲区的refcount和usagecount。如果refcount为0且usagecount<=1 (最多被访问过一次,而这一次很可能是全表遍历时,当前进程访问的), 则把这个缓冲区替换出来返回;否则表明该缓冲区仍在被其他进程使用中或最近被其他进程使用过,这时需采用和步骤2类似的方法,由上层调用函数采用一般的替换策略取得空闲缓冲区。
上述三步,简而言之就是:获取当前指针的下一个元素对应的缓冲区,若存在一个合法缓冲区,且该缓冲区没有进程在访问,且最近最多被访问过一次,则返回该缓冲区,否则采用一般替换策略。
代码如下:
static BufferDesc *
GetBufferFromRing(BufferAccessStrategy strategy, uint32 *buf_state)
{
BufferDesc *buf;
Buffer bufnum;
uint32 local_buf_state; /* to avoid repeated (de-)referencing */
/* Advance to next ring slot */
if (++strategy->current >= strategy->ring_size)
strategy->current = 0;
/*
* If the slot hasn't been filled yet, tell the caller to allocate a new
* buffer with the normal allocation strategy. He will then fill this
* slot by calling AddBufferToRing with the new buffer.
*/
bufnum = strategy->buffers[strategy->current];
if (bufnum == InvalidBuffer)
{
strategy->current_was_in_ring = false;
return NULL;
}
/*
* If the buffer is pinned we cannot use it under any circumstances.
*
* If usage_count is 0 or 1 then the buffer is fair game (we expect 1,
* since our own previous usage of the ring element would have left it
* there, but it might've been decremented by clock sweep since then). A
* higher usage_count indicates someone else has touched the buffer, so we
* shouldn't re-use it.
*/
buf = GetBufferDescriptor(bufnum - 1);
local_buf_state = LockBufHdr(buf);
if (BUF_STATE_GET_REFCOUNT(local_buf_state) == 0
&& BUF_STATE_GET_USAGECOUNT(local_buf_state) <= 1)
{
strategy->current_was_in_ring = true;
*buf_state = local_buf_state;
return buf;
}
UnlockBufHdr(buf, local_buf_state);
/*
* Tell caller to allocate a new buffer with the normal allocation
* strategy. He'll then replace this ring element via AddBufferToRing.
*/
strategy->current_was_in_ring = false;
return NULL;
}
AddBufferToRing
前面提到了AddBufferToRing,我们来看看他的实现:
static void
AddBufferToRing(BufferAccessStrategy strategy, BufferDesc *buf)
{
strategy->buffers[strategy->current] = BufferDescriptorGetBuffer(buf);
}
这个函数非常简单,就是将一个缓冲区在BufferDescriptors中的下标信息存入缓冲环中,current对应的位置。