Postgresql源码（87）数组构造与计算（Flat格式与Expand格式）

高铭杰

已于 2022-10-19 19:00:35 修改

阅读量1.2k

点赞数 4

分类专栏： pgsql 文章标签： postgresql 数据库数据结构

于 2022-10-19 19:00:11 首次发布

本文链接：https://blog.csdn.net/jackgo73/article/details/127409441

版权

pgsql 专栏收录该内容

274 篇文章 123 订阅

订阅专栏

相关
《Postgresql源码（51）变长类型实现(valena.c)》
《Postgresql源码（56）可扩展类型分析ExpandedObject/ExpandedRecord》
《Postgresql源码（87）数组构造与计算（Flat格式与Expand格式）》

总结

一句话总结
数组的标准构造函数会生成紧凑的flat结构ArrayType，像元组一样数据跟在后面；pl中会把flat紧凑结构解析到expand数组结构中，并加上mxct内存上下文归属关系，便于计算。

基础概念：一维'{1,2,3,4,5,6}'::int[]

ndims = 1 表示一维
p eah->dims[0] = 6 表示有6个元素
p eah->lbound[0] = 1 表示一维的下标左值

基础概念：二维'{{1,2,3,4},{3,4,5,6},{5,6,7,8}}'::int[]

ndims = 2 表示二维
p eah->dims[0] = 3 表示3行
p eah->dims[1] = 4 表示4列
p eah->lbound[0] = 1 表示下标左值，切片用
p eah->lbound[1] = 1

数组flat结构

数组flat结构即下图中的结构（一维数组'{1,2,3,4,5,6}'::int[]），也可以叫做紧凑结构、存储结构；便于存储，不便于计算。

数组expand结构

即下图中的数据结构ExpandedArrayHeader
标准EOH头加上数组特有的变量
函数expand_array负责将flat结构解析出来，挂到下面结构体对应的变量上
在pl内数组计算时，都是用的expand数组结构，注意：expand数组结构传值时，传递的是EOH的eoh_rw_ptr指针，指向1be结构，1be内部记录了EOH头部指针。（1be结构参考：《Postgresql源码（51）变长类型实现(valena.c)》）

EOH复习

《Postgresql源码（56）可扩展类型分析ExpandedObject/ExpandedRecord》

每一次复习都会对设计多一些认识：

在这里插入图片描述

EOH结构：例如数组、记录等复杂数据类型通常都有紧凑的磁盘格式，不便于修改，因为修改的时候必须把剩下的全部拷贝一遍。PG提供了"expended"表示，这种表示只在内存中使用，并且针对计算做了更多优化。
EOH结构：头部放了4个字节的控制位，为了适配PG的valena变长头结构。
EOH结构：尾部两个10字节的数组eoh_rw_ptr、eoh_ro_ptr，两个指针记录的内容都是一样的，都是指向一个1be结构，为什么用两个指针呢？因为EOH结构自带一些处理函数，例如下面两个函数。这些操作需要调用者拿着eoh_rw_ptr指针进来，如果用eoh_ro_ptr指针会core（只有Assert限制）。
- TransferExpandedObject：更新EOH的父mct
- DeleteExpandedObject：删除EOHmct内容

struct ExpandedObjectHeader
{
	/* Phony varlena header */
	int32		vl_len_;		/* always EOH_HEADER_MAGIC, see below */

	/* Pointer to methods required for object type */
	const ExpandedObjectMethods *eoh_methods;

	/* Memory context containing this header and subsidiary data */
	MemoryContext eoh_context;

	/* Standard R/W TOAST pointer for this object is kept here */
	char		eoh_rw_ptr[EXPANDED_POINTER_SIZE];

	/* Standard R/O TOAST pointer for this object is kept here */
	char		eoh_ro_ptr[EXPANDED_POINTER_SIZE];
};

EOH扩展数组：ExpandedArrayHeader

数据结构：
在这里插入图片描述

typedef struct ExpandedArrayHeader
{
	/* Standard header for expanded objects */
	ExpandedObjectHeader hdr;

	/* Magic value identifying an expanded array (for debugging only) */
	int			ea_magic;

	/* Dimensionality info (always valid) */
	int			ndims;			/* # of dimensions */
	int		   *dims;			/* array dimensions */
	int		   *lbound;			/* index lower bounds for each dimension */

	/* Element type info (always valid) */
	Oid			element_type;	/* element type OID */
	int16		typlen;			/* needed info about element datatype */
	bool		typbyval;
	char		typalign;

	/*
	 * If we have a Datum-array representation of the array, it's kept here;
	 * else dvalues/dnulls are NULL.  The dvalues and dnulls arrays are always
	 * palloc'd within the object private context, but may change size from
	 * time to time.  For pass-by-ref element types, dvalues entries might
	 * point either into the fstartptr..fendptr area, or to separately
	 * palloc'd chunks.  Elements should always be fully detoasted, as they
	 * are in the standard flat representation.
	 *
	 * Even when dvalues is valid, dnulls can be NULL if there are no null
	 * elements.
	 */
	Datum	   *dvalues;		/* array of Datums */
	bool	   *dnulls;			/* array of is-null flags for Datums */
	int			dvalueslen;		/* allocated length of above arrays */
	int			nelems;			/* number of valid entries in above arrays */

	/*
	 * flat_size is the current space requirement for the flat equivalent of
	 * the expanded array, if known; otherwise it's 0.  We store this to make
	 * consecutive calls of get_flat_size cheap.
	 */
	Size		flat_size;

	/*
	 * fvalue points to the flat representation if it is valid, else it is
	 * NULL.  If we have or ever had a flat representation then
	 * fstartptr/fendptr point to the start and end+1 of its data area; this
	 * is so that we can tell which Datum pointers point into the flat
	 * representation rather than being pointers to separately palloc'd data.
	 */
	ArrayType  *fvalue;			/* must be a fully detoasted array */
	char	   *fstartptr;		/* start of its data area */
	char	   *fendptr;		/* end+1 of its data area */
} ExpandedArrayHeader;

测试SQL

DO $$
DECLARE
  arr int[] = ARRAY[1,2,3,4,5,6];
BEGIN
  raise notice '%', arr[3];
END;
$$;

第一步：数组构造：construct_md_array

执行到 arr int[] = ARRAY[1,2,3]; 由优化器解析常量表达式时进入construct_md_array。

plpgsql_inline_handler
  plpgsql_exec_function
    ...
    exec_assign_expr
      exec_prepare_plan
        exec_simple_check_plan
          ...
          BuildCachedPlan
            pg_plan_queries
              pg_plan_query
                planner
                  ...
                  eval_const_expressions
                    ...
                    ExecInterpExpr
                      ExecEvalArrayExpr
                        construct_md_array

construct_md_array函数

ArrayType *
construct_md_array(Datum *elems,
				   bool *nulls,
				   int ndims,
				   int *dims,
				   int *lbs,
				   Oid elmtype, int elmlen, bool elmbyval, char elmalign)
{

入参

(elems=0x2b130d8, 
 nulls=0x2b13128, 
 ndims=1,                 --> 几维？ndims = 1
 dims=0x7ffdcf177ae0,     --> 每个维度有多大？ dims[0] = 6
 lbs=0x7ffdcf177ac0,      --> 下标限制：lbs[0] = 1; 当前数组下标是从1开始的
 elmtype=23, 
 elmlen=4, 
 elmbyval=true, 
 elmalign=105 'i')

这里的lbs要特意提一下，因为PG数组支持这种用法：

 postgres=# select f1[2] from (select '[2:3]={1,2}'::int[] as f1);
 f1 
----
  1
(1 row)

所以在构造时，可能也会提供下标，上面例子中的左下标是2开始的，所以ArrayCheckBounds时第三个参数：int *lb会给{2}

	ArrayType  *result;
	bool		hasnulls;
	int32		nbytes;
	int32		dataoffset;
	int			i;
	int			nelems;

	/* This checks for overflow of the array dimensions */
	nelems = ArrayGetNItems(ndims, dims);

每个维度检查一下给的左下标是不是太大了，这里的情况是：

dims=1

只需要检查lab[0]即可，lab[0]=1<2147483640符合要求

如果dims=2，需要继续检查lab[1]

	ArrayCheckBounds(ndims, dims, lbs);

现在是有数据传入的nelems=6，不能构造空数组

	/* if ndims <= 0 or any dims[i] == 0, return empty array */
	if (nelems <= 0)
		return construct_empty_array(elmtype);

	nbytes = 0;
	hasnulls = false;

att_addlength_datum算长度
att_align_nominal算对齐长度，这里elmalign='i’表示整形，长度4不用对齐

最后6个数字总共需要nbytes=6x4=24字节

	for (i = 0; i < nelems; i++)
	{
		if (nulls && nulls[i])
		{
			hasnulls = true;
			continue;
		}
		nbytes = att_addlength_datum(nbytes, elmlen, elems[i]);
		nbytes = att_align_nominal(nbytes, elmalign);
	}

	/* Allocate and initialize result array */
	if (hasnulls)
	{
		dataoffset = ARR_OVERHEAD_WITHNULLS(ndims, nelems);
		nbytes += dataoffset;
	}
	else
	{
		dataoffset = 0;			/* marker for no null bitmap */
		nbytes += ARR_OVERHEAD_NONULLS(ndims);
	}
	result = (ArrayType *) palloc0(nbytes);
	SET_VARSIZE(result, nbytes);

查看长度？
nbytes = 48

(gdb) p ((varattrib_4b*)result)->va_4byte->va_header>>2
$116 = 48

	result->ndim = ndims;
	result->dataoffset = dataoffset;
	result->elemtype = elmtype;
	memcpy(ARR_DIMS(result), dims, ndims * sizeof(int));
	memcpy(ARR_LBOUND(result), lbs, ndims * sizeof(int));

	CopyArrayEls(result,
				 elems, nulls, nelems,
				 elmlen, elmbyval, elmalign,
				 false);

	return result;
}

最终内存结构
在这里插入图片描述

第二步：赋值前调用expand_array转换ArrayType为ExpandedArray

arr int[] = ARRAY[1,2,3,4,5,6];等号右侧执行完会构造出ArrayType上图中的数据结构，现在需要将ArrayType结构包装成Expand Array结构来使用，使数组结构拥有父mcxt，增加归属。

Datum
expand_array(Datum arraydatum, MemoryContext parentcontext,
			 ArrayMetaState *metacache)
{
	ArrayType  *array;
	ExpandedArrayHeader *eah;
	MemoryContext objcxt;
	MemoryContext oldcxt;
	ArrayMetaState fakecache;

创建"expanded array"挂在入参提供"SPI Proc"下。

	objcxt = AllocSetContextCreate(parentcontext,
								   "expanded array",
								   ALLOCSET_START_SMALL_SIZES);

	/* Set up expanded array header */
	eah = (ExpandedArrayHeader *)
		MemoryContextAlloc(objcxt, sizeof(ExpandedArrayHeader));

初始化EOH结构

eah->hdr：array是EOH的子结构，给出eah->hdr指向EOH
EA_methods：给数组专用转换函数EA_get_flat_size、EA_flatten_into用于将expanded结构转换为存储结构，这里的存储结构就是指的ArrayType上图中的紧凑结构
objcxt：配置上下文

	EOH_init_header(&eah->hdr, &EA_methods, objcxt);
	eah->ea_magic = EA_MAGIC;

下面开始把紧凑结构展开到ExpandedArrayHeader结构体重

先切到"expanded array"把flat array数据拷贝过来

	oldcxt = MemoryContextSwitchTo(objcxt);
	array = DatumGetArrayTypePCopy(arraydatum);
	MemoryContextSwitchTo(oldcxt);

p eah->ndims = 1
p eah->dims[0] = 6
p eah->lbound[0] = 1
p eah->element_type = 23
p eah->typlen = 4
p eah->typbyval = true
p eah->typalign = ‘i’

	eah->ndims = ARR_NDIM(array);
	/* note these pointers point into the fvalue header! */
	eah->dims = ARR_DIMS(array);
	eah->lbound = ARR_LBOUND(array);
	eah->element_type = ARR_ELEMTYPE(array);
	...
		get_typlenbyvalalign(eah->element_type,
							 &eah->typlen,
							 &eah->typbyval,
							 &eah->typalign);
	...

	/* we don't make a deconstructed representation now */
	eah->dvalues = NULL;
	eah->dnulls = NULL;
	eah->dvalueslen = 0;
	eah->nelems = 0;
	eah->flat_size = 0;

flat头位置由eah->fvalue指向
flat数据位置由fstartptr指向
flat整体结尾位置由fendptr指向

	/* remember we have a flat representation */
	eah->fvalue = array;
	eah->fstartptr = ARR_DATA_PTR(array);
	eah->fendptr = ((char *) array) + ARR_SIZE(array);

注意，返回的是EOH的eoh_rw_ptr指针（再复习：eoh_rw_ptr指针指向1be数据部分放了个EOH头指针）

	/* return a R/W pointer to the expanded array */
	return EOHPGetRWDatum(&eah->hdr);
}

高铭杰

关注

4
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
3
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录