本节简单介绍了PostgreSQL在执行插入过程中与存储相关的函数RecordAndGetPageWithFreeSpace,该函数由RelationGetBufferForTuple函数调用,返回满足条件的block。
一、数据结构
FSMAddress
内部的FSM处理过程以逻辑地址scheme的方式工作,树的每一个层次都可以认为是一个独立的地址文件.
/*
* The internal FSM routines work on a logical addressing scheme. Each
* level of the tree can be thought of as a separately addressable file.
* 内部的FSM处理过程工作在一个逻辑地址scheme上.
* 树的每一个层次都可以认为是一个独立的地址文件.
*/
typedef struct
{
//层次
int level; /* level */
//该层次内的页编号
int logpageno; /* page number within the level */
} FSMAddress;
/* Address of the root page. */
//根页地址
static const FSMAddress FSM_ROOT_ADDRESS = {FSM_ROOT_LEVEL, 0};
FSMPage
FSM page数据结构.详细可参看src/backend/storage/freespace/README.
/*
* Structure of a FSM page. See src/backend/storage/freespace/README for
* details.
* FSM page数据结构.详细可参看src/backend/storage/freespace/README.
*/
typedef struct
{
/*
* fsm_search_avail() tries to spread the load of multiple backends by
* returning different pages to different backends in a round-robin
* fashion. fp_next_slot points to the next slot to be returned (assuming
* there's enough space on it for the request). It's defined as an int,
* because it's updated without an exclusive lock. uint16 would be more
* appropriate, but int is more likely to be atomically
* fetchable/storable.
* fsm_search_avail()函数尝试通过在一轮循环中返回不同的页面到不同的后台进程,
* 从而分散在后台进程上分散负载.
* 该字段因为无需独占锁,因此定义为整型.
* unit16可能会更合适,但整型看起来更适合于原子提取和存储.
*/
int fp_next_slot;
/*
* fp_nodes contains the binary tree, stored in array. The first
* NonLeafNodesPerPage elements are upper nodes, and the following
* LeafNodesPerPage elements are leaf nodes. Unused nodes are zero.
* fp_nodes以数组的形式存储二叉树.
* 第一个NonLeafNodesPerPage元素是上一层的节点,接下来的LeafNodesPerPage元素是叶子节点.
* 未使用的节点为0.
*/
uint8 fp_nodes[FLEXIBLE_ARRAY_MEMBER];
} FSMPageData;
typedef FSMPageData *FSMPage;
FSMLocalMap
对于小表,不需要创建FSM来存储空间信息,使用本地的内存映射信息.
/* Either already tried, or beyond the end of the relation */
//已尝试或者已在表的末尾之后
#define FSM_LOCAL_NOT_AVAIL 0x00
/* Available to try */
//可用于尝试
#define FSM_LOCAL_AVAIL 0x01
/*
* For small relations, we don't create FSM to save space, instead we use
* local in-memory map of pages to try. To locate free space, we simply try
* pages directly without knowing ahead of time how much free space they have.
* 对于小表,不需要创建FSM来存储空间信息,使用本地的内存映射信息.
* 为了定位空闲空间,我们不需要知道他们有多少空闲空间而是直接简单的对page进行尝试.
*
* Note that this map is used to the find the block with required free space
* for any given relation. We clear this map when we have found a block with
* enough free space, when we extend the relation, or on transaction abort.
* See src/backend/storage/freespace/README for further details.
* 注意这个map用于搜索给定表的请求空闲空间.
* 在找到有足够空闲空间的block/扩展了relation/在事务回滚时,则清除这个map的信息.
* 详细可查看src/backend/storage/freespace/README.
*/
typedef struct
{
BlockNumber nblocks;//块数
uint8 map[HEAP_FSM_CREATION_THRESHOLD];//数组
} FSMLocalMap;
static FSMLocalMap fsm_local_map =
{
0,
{
FSM_LOCAL_NOT_AVAIL
}
};
#define FSM_LOCAL_MAP_EXISTS (fsm_local_map.nblocks > 0)
二、源码解读
RecordAndGetPageWithFreeSpace返回满足条件的block,其主要逻辑如下:
1.初始化相关变量
2.如存在本地map,则首先使用该文件,调用fsm_local_search
3.如果没有本地map也没有FSM,创建本地map,然后调用fsm_local_search
4.使用FSM搜索
4.1获取FSM中原page可用空间对应的catalog
4.2根据所需空间大小,获取FSM中相应的catalog
4.3根据原页面,获取heap block所在的位置(FSMAddress)
4.4检索获取目标slot
4.5如目标slot合法,则获取相应的block,否则使用fsm_search搜索合适的block
/*
* RecordAndGetPageWithFreeSpace - update info about a page and try again.
* RecordAndGetPageWithFreeSpace - 更新page info并再次尝试.
*
* We provide this combo form to save some locking overhead, compared to
* separate RecordPageWithFreeSpace + GetPageWithFreeSpace calls. There's
* also some effort to return a page close to the old page; if there's a
* page with enough free space on the same FSM page where the old one page
* is located, it is preferred.
* 相对于单独的RecordPageWithFreeSpace + GetPageWithFreeSpace调用,
* 我们提供这个组合形式用于节省一些锁的负载.
* 这里同样存储一些努力用于返回接近旧page的page.
* 如果与旧的page在同一个FSM page上有足够空闲空间的page存在,那这个page会被选中.
*
* For very small heap relations that don't have a FSM, we update the local
* map to indicate we have tried a page, and return the next page to try.
* 对于非常小的堆表,是不需要FSM的,直接更新本地map来提示进程需要尝试获得一个page,并返回下一个page.
*/
BlockNumber
RecordAndGetPageWithFreeSpace(Relation rel, BlockNumber oldPage,
Size oldSpaceAvail, Size spaceNeeded)
{
int old_cat;
int search_cat;
FSMAddress addr;//FSM地址
uint16 slot;//槽号
int search_slot;
BlockNumber nblocks = InvalidBlockNumber;
/* First try the local map, if it exists. */
//如存在本地map,则首先使用该文件.
//#define FSM_LOCAL_MAP_EXISTS (fsm_local_map.nblocks > 0)
if (FSM_LOCAL_MAP_EXISTS)
{
Assert((rel->rd_rel->relkind == RELKIND_RELATION ||
rel->rd_rel->relkind == RELKIND_TOASTVALUE) &&
fsm_local_map.map[oldPage] == FSM_LOCAL_AVAIL);
//设置oldPage为不可用
fsm_local_map.map[oldPage] = FSM_LOCAL_NOT_AVAIL;
//搜索并返回结果
return fsm_local_search();
}
if (!fsm_allow_writes(rel, oldPage, InvalidBlockNumber, &nblocks))
{
//---- 如果FSM不允许写
/*
* If we have neither a local map nor a FSM, we probably just tried
* the target block in the smgr relation entry and failed, so we'll
* need to create the local map.
* 如果没有本地map也没有FSM,
* 那么我们只是尝试了smgr relation中的目标block而且失败了,那么需要创建本地map.
*/
//设置本地map
fsm_local_set(rel, nblocks);
//搜索本地map
return fsm_local_search();
}
/* Normal FSM logic follows */
//------ 使用FSM的逻辑
//oldSpaceAvail/32,最大255/254
old_cat = fsm_space_avail_to_cat(oldSpaceAvail);
//(needed + FSM_CAT_STEP - 1) / FSM_CAT_STEP
//#define FSM_CAT_STEP (BLCKSZ / FSM_CATEGORIES)
//#define FSM_CATEGORIES 256
search_cat = fsm_space_needed_to_cat(spaceNeeded);
/* Get the location of the FSM byte representing the heap block */
//获得对应heap block的位置
addr = fsm_get_location(oldPage, &slot);
//在给定的FSM page和slot中设置值,并返回slot
search_slot = fsm_set_and_search(rel, addr, slot, old_cat, search_cat);
/*
* If fsm_set_and_search found a suitable new block, return that.
* Otherwise, search as usual.
* 如fsm_set_and_search成功找到合适的block,则返回;否则,执行常规的检索.
*/
if (search_slot != -1)
return fsm_get_heap_blk(addr, search_slot);
else
return fsm_search(rel, search_cat);
}
/*
* Search the local map for an available block to try, in descending order.
* As such, there is no heuristic available to decide which order will be
* better to try, but the probability of having space in the last block in the
* map is higher because that is the most recent block added to the heap.
* 以倒序的方式检索本地map找可用的block.
* 在这种情况下,没有特别好的办法用于确定那种排序方法更好,
* 但在map中最后一个block中存在空闲空间的可能性更高,因为这是最近添加到堆中的block.
*
* This function is used when there is no FSM.
* 如无FSM则使用该函数.
*/
static BlockNumber
fsm_local_search(void)
{
BlockNumber target_block;
/* Local map must be set by now. */
//现在本地map必须已设置
Assert(FSM_LOCAL_MAP_EXISTS);
//目标block
target_block = fsm_local_map.nblocks;
do
{
//循环
target_block--;//从最后一个block开始
if (fsm_local_map.map[target_block] == FSM_LOCAL_AVAIL)
return target_block;//最后一个block可用,则返回
} while (target_block > 0);
//target_block == 0
/*
* If we didn't find any available block to try in the local map, then
* clear it. This prevents us from using the map again without setting it
* first, which would otherwise lead to the same conclusion again and
* again.
* 在本地map中没有发现可用的block,则清除相关信息.
* 这可以防止我们在没有正确设置map的情况下使用该map,
* 这会导致重复的相同结论(没有可用的block).
*/
FSMClearLocalMap();
//返回InvalidBlockNumber
return InvalidBlockNumber;
}
/*
* Initialize or update the local map of blocks to try, for when there is
* no FSM.
* 如无FSM,则初始化并更新本地map
*
* When we initialize the map, the whole heap is potentially available to
* try. Testing revealed that trying every block can cause a small
* performance dip compared to when we use a FSM, so we try every other
* block instead.
* 在我们初始化map的时候,整个堆可能已可用.
* 测试表名,与使用FSM相比,尝试每个块会导致小幅的性能下降,因此尝试每一个块.
*/
static void
fsm_local_set(Relation rel, BlockNumber cur_nblocks)
{
BlockNumber blkno,
cached_target_block;
/* The local map must not be set already. */
//验证
Assert(!FSM_LOCAL_MAP_EXISTS);
/*
* Starting at the current last block in the relation and working
* backwards, mark alternating blocks as available.
* 在关系的当前最后一个块开始往后减少,标记可更新的块可用.
*/
blkno = cur_nblocks - 1;//最后一个块
while (true)
{
//更新为可用
fsm_local_map.map[blkno] = FSM_LOCAL_AVAIL;
if (blkno >= 2)
blkno -= 2;
else
break;
}
/* Cache the number of blocks. */
//缓存块数
fsm_local_map.nblocks = cur_nblocks;
/* Set the status of the cached target block to 'unavailable'. */
//设置缓存的目标块状态为未可用
cached_target_block = RelationGetTargetBlock(rel);
if (cached_target_block != InvalidBlockNumber &&
cached_target_block < cur_nblocks)
fsm_local_map.map[cached_target_block] = FSM_LOCAL_NOT_AVAIL;
}
/*
* Return category corresponding x bytes of free space
* 返回相应有x字节空间空间的目录
*/
static uint8
fsm_space_avail_to_cat(Size avail)
{
int cat;
//确保请求的小于块大小
Assert(avail < BLCKSZ);
//如大于最大请求大小,返回255
//#define MaxFSMRequestSize MaxHeapTupleSize
//#define MaxHeapTupleSize (BLCKSZ - MAXALIGN(SizeOfPageHeaderData + sizeof(ItemIdData)))
if (avail >= MaxFSMRequestSize)
return 255;
//#define FSM_CAT_STEP (BLCKSZ / FSM_CATEGORIES)
//#define FSM_CATEGORIES 256
//块大小为8K则FSM_CAT_STEP = 32
cat = avail / FSM_CAT_STEP;
/*
* The highest category, 255, is reserved for MaxFSMRequestSize bytes or
* more.
* 最高层的目录,255,保留用于MaxFSMRequestSize或者更大的大小.
*/
if (cat > 254)
cat = 254;//返回254
return (uint8) cat;
}
/*
* Which category does a page need to have, to accommodate x bytes of data?
* While fsm_size_to_avail_cat() rounds down, this needs to round up.
* 哪一个目录有需要的page,可满足x bytes大小的数据.
* 因为fsm_size_to_avail_cat()往下取整,因此这里需要往上取整.
*/
static uint8
fsm_space_needed_to_cat(Size needed)
{
int cat;
/* Can't ask for more space than the highest category represents */
//不能要求最大目录可能表示的空间大小
if (needed > MaxFSMRequestSize)
elog(ERROR, "invalid FSM request size %zu", needed);
if (needed == 0)
return 1;
cat = (needed + FSM_CAT_STEP - 1) / FSM_CAT_STEP;
if (cat > 255)
cat = 255;
return (uint8) cat;
}
/*
* Return the FSM location corresponding to given heap block.
* 返回给定堆block的FSM位置.
*/
//addr = fsm_get_location(oldPage, &slot);
static FSMAddress
fsm_get_location(BlockNumber heapblk, uint16 *slot)
{
FSMAddress addr;
addr.level = FSM_BOTTOM_LEVEL;
//#define SlotsPerFSMPage LeafNodesPerPage
//#define LeafNodesPerPage (NodesPerPage - NonLeafNodesPerPage)
//#define NodesPerPage (BLCKSZ - MAXALIGN(SizeOfPageHeaderData) - \
offsetof(FSMPageData, fp_nodes))
//#define NonLeafNodesPerPage (BLCKSZ / 2 - 1)
addr.logpageno = heapblk / SlotsPerFSMPage;
*slot = heapblk % SlotsPerFSMPage;
return addr;
}
三、跟踪分析
测试脚本
15:54:13 (xdb@[local]:5432)testdb=# insert into t1 values (1,'1','1');
启动gdb,设置断点
(gdb) b RecordAndGetPageWithFreeSpace
Breakpoint 1 at 0x8879e4: file freespace.c, line 152.
(gdb) c
Continuing.
Breakpoint 1, RecordAndGetPageWithFreeSpace (rel=0x7fad0df13788, oldPage=1, oldSpaceAvail=16, spaceNeeded=32)
at freespace.c:152
152 int old_cat = fsm_space_avail_to_cat(oldSpaceAvail);
(gdb)
输入参数
(gdb) p *rel
$5 = {rd_node = {spcNode = 1663, dbNode = 16402, relNode = 50820}, rd_smgr = 0x2084b00, rd_refcnt = 1, rd_backend = -1,
rd_islocaltemp = false, rd_isnailed = false, rd_isvalid = true, rd_indexvalid = 1 '\001', rd_statvalid = false,
rd_createSubid = 0, rd_newRelfilenodeSubid = 0, rd_rel = 0x7fad0df139a0, rd_att = 0x7fad0df13ab8, rd_id = 50820,
rd_lockInfo = {lockRelId = {relId = 50820, dbId = 16402}}, rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0,
rd_rsdesc = 0x0, rd_fkeylist = 0x0, rd_fkeyvalid = false, rd_partkeycxt = 0x0, rd_partkey = 0x0, rd_pdcxt = 0x0,
rd_partdesc = 0x0, rd_partcheck = 0x0, rd_indexlist = 0x7fad0df12820, rd_oidindex = 0, rd_pkindex = 0,
rd_replidindex = 0, rd_statlist = 0x0, rd_indexattr = 0x0, rd_projindexattr = 0x0, rd_keyattr = 0x0, rd_pkattr = 0x0,
rd_idattr = 0x0, rd_projidx = 0x0, rd_pubactions = 0x0, rd_options = 0x0, rd_index = 0x0, rd_indextuple = 0x0,
rd_amhandler = 0, rd_indexcxt = 0x0, rd_amroutine = 0x0, rd_opfamily = 0x0, rd_opcintype = 0x0, rd_support = 0x0,
rd_supportinfo = 0x0, rd_indoption = 0x0, rd_indexprs = 0x0, rd_indpred = 0x0, rd_exclops = 0x0, rd_exclprocs = 0x0,
rd_exclstrats = 0x0, rd_amcache = 0x0, rd_indcollation = 0x0, rd_fdwroutine = 0x0, rd_toastoid = 0,
pgstat_info = 0x20785f0}
(gdb)
1.初始化相关变量
2.如存在本地map,则首先使用该文件,调用fsm_local_search
3.如果没有本地map也没有FSM,创建本地map,然后调用fsm_local_search
4.使用FSM搜索
4.1获取FSM中原page可用空间对应的catalog —> 0
4.2根据所需空间大小,获取FSM中相应的catalog —> 1
(gdb) n
153 int search_cat = fsm_space_needed_to_cat(spaceNeeded);
(gdb)
159 addr = fsm_get_location(oldPage, &slot);
(gdb) p old_cat
$1 = 0
(gdb) p search_cat
$2 = 1
(gdb)
4.3根据原页面,获取heap block所在的位置(FSMAddress)
(gdb) n
161 search_slot = fsm_set_and_search(rel, addr, slot, old_cat, search_cat);
(gdb) p addr
$3 = {level = 0, logpageno = 0}
(gdb)
4.4检索获取目标slot
(gdb) n
167 if (search_slot != -1)
(gdb) p search_slot
$4 = 4
(gdb)
4.5如目标slot合法,则获取相应的block,否则使用fsm_search搜索合适的block
(gdb) n
168 return fsm_get_heap_blk(addr, search_slot);
(gdb)
171 }
(gdb)
RelationGetBufferForTuple (relation=0x7fad0df13788, len=32, otherBuffer=0, options=0, bistate=0x0, vmbuffer=0x7ffe1b797dcc,
vmbuffer_other=0x0) at hio.c:397
397 while (targetBlock != InvalidBlockNumber)
(gdb) p targetBlock
$6 = 4
(gdb)
DONE!
四、参考资料
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/6906/viewspace-2637613/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/6906/viewspace-2637613/