相关
《Postgresql源码(51)变长类型实现(valena.c)》
《Postgresql源码(56)可扩展类型分析ExpandedObject/ExpandedRecord》
《Postgresql源码(87)数组构造与计算(Flat格式与Expand格式)》
总结
一句话总结
数组的标准构造函数会生成紧凑的flat结构ArrayType,像元组一样数据跟在后面;pl中会把flat紧凑结构解析到expand数组结构中,并加上mxct内存上下文归属关系,便于计算。
基础概念:一维'{1,2,3,4,5,6}'::int[]
ndims = 1 表示一维
p eah->dims[0] = 6 表示有6个元素
p eah->lbound[0] = 1 表示一维的下标左值
基础概念:二维'{{1,2,3,4},{3,4,5,6},{5,6,7,8}}'::int[]
ndims = 2 表示二维
p eah->dims[0] = 3 表示3行
p eah->dims[1] = 4 表示4列
p eah->lbound[0] = 1 表示下标左值,切片用
p eah->lbound[1] = 1
数组flat结构
- 数组flat结构即下图中的结构(一维数组
'{1,2,3,4,5,6}'::int[]
),也可以叫做紧凑结构、存储结构;便于存储,不便于计算。
数组expand结构
- 即下图中的数据结构ExpandedArrayHeader
- 标准EOH头加上数组特有的变量
- 函数expand_array负责将flat结构解析出来,挂到下面结构体对应的变量上
- 在pl内数组计算时,都是用的expand数组结构,注意:expand数组结构传值时,传递的是EOH的eoh_rw_ptr指针,指向1be结构,1be内部记录了EOH头部指针。(1be结构参考:《Postgresql源码(51)变长类型实现(valena.c)》)
EOH复习
《Postgresql源码(56)可扩展类型分析ExpandedObject/ExpandedRecord》
每一次复习都会对设计多一些认识:
- EOH结构:例如数组、记录等复杂数据类型通常都有紧凑的磁盘格式,不便于修改,因为修改的时候必须把剩下的全部拷贝一遍。PG提供了"expended"表示,这种表示只在内存中使用,并且针对计算做了更多优化。
- EOH结构:头部放了4个字节的控制位,为了适配PG的valena变长头结构。
- EOH结构:尾部两个10字节的数组eoh_rw_ptr、eoh_ro_ptr,两个指针记录的内容都是一样的,都是指向一个1be结构,为什么用两个指针呢? 因为EOH结构自带一些处理函数,例如下面两个函数。这些操作需要调用者拿着eoh_rw_ptr指针进来,如果用eoh_ro_ptr指针会core(只有Assert限制)。
- TransferExpandedObject:更新EOH的父mct
- DeleteExpandedObject:删除EOHmct内容
struct ExpandedObjectHeader
{
/* Phony varlena header */
int32 vl_len_; /* always EOH_HEADER_MAGIC, see below */
/* Pointer to methods required for object type */
const ExpandedObjectMethods *eoh_methods;
/* Memory context containing this header and subsidiary data */
MemoryContext eoh_context;
/* Standard R/W TOAST pointer for this object is kept here */
char eoh_rw_ptr[EXPANDED_POINTER_SIZE];
/* Standard R/O TOAST pointer for this object is kept here */
char eoh_ro_ptr[EXPANDED_POINTER_SIZE];
};
EOH扩展数组:ExpandedArrayHeader
数据结构:
typedef struct ExpandedArrayHeader
{
/* Standard header for expanded objects */
ExpandedObjectHeader hdr;
/* Magic value identifying an expanded array (for debugging only) */
int ea_magic;
/* Dimensionality info (always valid) */
int ndims; /* # of dimensions */
int *dims; /* array dimensions */
int *lbound; /* index lower bounds for each dimension */
/* Element type info (always valid) */
Oid element_type; /* element type OID */
int16 typlen; /* needed info about element datatype */
bool typbyval;
char typalign;
/*
* If we have a Datum-array representation of the array, it's kept here;
* else dvalues/dnulls are NULL. The dvalues and dnulls arrays are always
* palloc'd within the object private context, but may change size from
* time to time. For pass-by-ref element types, dvalues entries might
* point either into the fstartptr..fendptr area, or to separately
* palloc'd chunks. Elements should always be fully detoasted, as they
* are in the standard flat representation.
*
* Even when dvalues is valid, dnulls can be NULL if there are no null
* elements.
*/
Datum *dvalues; /* array of Datums */
bool *dnulls; /* array of is-null flags for Datums */
int dvalueslen; /* allocated length of above arrays */
int nelems; /* number of valid entries in above arrays */
/*
* flat_size is the current space requirement for the flat equivalent of
* the expanded array, if known; otherwise it's 0. We store this to make
* consecutive calls of get_flat_size cheap.
*/
Size flat_size;
/*
* fvalue points to the flat representation if it is valid, else it is
* NULL. If we have or ever had a flat representation then
* fstartptr/fendptr point to the start and end+1 of its data area; this
* is so that we can tell which Datum pointers point into the flat
* representation rather than being pointers to separately palloc'd data.
*/
ArrayType *fvalue; /* must be a fully detoasted array */
char *fstartptr; /* start of its data area */
char *fendptr; /* end+1 of its data area */
} ExpandedArrayHeader;
测试SQL
DO $$
DECLARE
arr int[] = ARRAY[1,2,3,4,5,6];
BEGIN
raise notice '%', arr[3];
END;
$$;
第一步:数组构造:construct_md_array
执行到 arr int[] = ARRAY[1,2,3];
由优化器解析常量表达式时进入construct_md_array。
plpgsql_inline_handler
plpgsql_exec_function
...
exec_assign_expr
exec_prepare_plan
exec_simple_check_plan
...
BuildCachedPlan
pg_plan_queries
pg_plan_query
planner
...
eval_const_expressions
...
ExecInterpExpr
ExecEvalArrayExpr
construct_md_array
construct_md_array函数
ArrayType *
construct_md_array(Datum *elems,
bool *nulls,
int ndims,
int *dims,
int *lbs,
Oid elmtype, int elmlen, bool elmbyval, char elmalign)
{
入参
(elems=0x2b130d8,
nulls=0x2b13128,
ndims=1, --> 几维?ndims = 1
dims=0x7ffdcf177ae0, --> 每个维度有多大? dims[0] = 6
lbs=0x7ffdcf177ac0, --> 下标限制:lbs[0] = 1; 当前数组下标是从1开始的
elmtype=23,
elmlen=4,
elmbyval=true,
elmalign=105 'i')
这里的lbs要特意提一下,因为PG数组支持这种用法:
postgres=# select f1[2] from (select '[2:3]={1,2}'::int[] as f1);
f1
----
1
(1 row)
所以在构造时,可能也会提供下标,上面例子中的左下标是2开始的,所以ArrayCheckBounds时第三个参数:int *lb
会给{2}
,
ArrayType *result;
bool hasnulls;
int32 nbytes;
int32 dataoffset;
int i;
int nelems;
/* This checks for overflow of the array dimensions */
nelems = ArrayGetNItems(ndims, dims);
每个维度检查一下给的左下标是不是太大了,这里的情况是:
dims=1
只需要检查lab[0]即可,lab[0]=1<2147483640
符合要求
如果dims=2,需要继续检查lab[1]
ArrayCheckBounds(ndims, dims, lbs);
现在是有数据传入的nelems=6,不能构造空数组
/* if ndims <= 0 or any dims[i] == 0, return empty array */
if (nelems <= 0)
return construct_empty_array(elmtype);
nbytes = 0;
hasnulls = false;
att_addlength_datum算长度
att_align_nominal算对齐长度,这里elmalign='i’表示整形,长度4不用对齐
最后6个数字总共需要nbytes=6x4=24字节
for (i = 0; i < nelems; i++)
{
if (nulls && nulls[i])
{
hasnulls = true;
continue;
}
nbytes = att_addlength_datum(nbytes, elmlen, elems[i]);
nbytes = att_align_nominal(nbytes, elmalign);
}
/* Allocate and initialize result array */
if (hasnulls)
{
dataoffset = ARR_OVERHEAD_WITHNULLS(ndims, nelems);
nbytes += dataoffset;
}
else
{
dataoffset = 0; /* marker for no null bitmap */
nbytes += ARR_OVERHEAD_NONULLS(ndims);
}
result = (ArrayType *) palloc0(nbytes);
SET_VARSIZE(result, nbytes);
查看长度?
nbytes = 48
(gdb) p ((varattrib_4b*)result)->va_4byte->va_header>>2
$116 = 48
result->ndim = ndims;
result->dataoffset = dataoffset;
result->elemtype = elmtype;
memcpy(ARR_DIMS(result), dims, ndims * sizeof(int));
memcpy(ARR_LBOUND(result), lbs, ndims * sizeof(int));
CopyArrayEls(result,
elems, nulls, nelems,
elmlen, elmbyval, elmalign,
false);
return result;
}
最终内存结构
第二步:赋值前调用expand_array转换ArrayType为ExpandedArray
arr int[] = ARRAY[1,2,3,4,5,6];
等号右侧执行完会构造出ArrayType上图中的数据结构,现在需要将ArrayType结构包装成Expand Array结构来使用,使数组结构拥有父mcxt,增加归属。
Datum
expand_array(Datum arraydatum, MemoryContext parentcontext,
ArrayMetaState *metacache)
{
ArrayType *array;
ExpandedArrayHeader *eah;
MemoryContext objcxt;
MemoryContext oldcxt;
ArrayMetaState fakecache;
创建"expanded array"挂在入参提供"SPI Proc"下。
objcxt = AllocSetContextCreate(parentcontext,
"expanded array",
ALLOCSET_START_SMALL_SIZES);
/* Set up expanded array header */
eah = (ExpandedArrayHeader *)
MemoryContextAlloc(objcxt, sizeof(ExpandedArrayHeader));
初始化EOH结构
- eah->hdr:array是EOH的子结构,给出eah->hdr指向EOH
- EA_methods:给数组专用转换函数EA_get_flat_size、EA_flatten_into用于将expanded结构转换为存储结构,这里的存储结构就是指的ArrayType上图中的紧凑结构
- objcxt:配置上下文
EOH_init_header(&eah->hdr, &EA_methods, objcxt);
eah->ea_magic = EA_MAGIC;
下面开始把紧凑结构展开到ExpandedArrayHeader结构体重
先切到"expanded array"把flat array数据拷贝过来
oldcxt = MemoryContextSwitchTo(objcxt);
array = DatumGetArrayTypePCopy(arraydatum);
MemoryContextSwitchTo(oldcxt);
p eah->ndims = 1
p eah->dims[0] = 6
p eah->lbound[0] = 1
p eah->element_type = 23
p eah->typlen = 4
p eah->typbyval = true
p eah->typalign = ‘i’
eah->ndims = ARR_NDIM(array);
/* note these pointers point into the fvalue header! */
eah->dims = ARR_DIMS(array);
eah->lbound = ARR_LBOUND(array);
eah->element_type = ARR_ELEMTYPE(array);
...
get_typlenbyvalalign(eah->element_type,
&eah->typlen,
&eah->typbyval,
&eah->typalign);
...
/* we don't make a deconstructed representation now */
eah->dvalues = NULL;
eah->dnulls = NULL;
eah->dvalueslen = 0;
eah->nelems = 0;
eah->flat_size = 0;
flat头位置由eah->fvalue指向
flat数据位置由fstartptr指向
flat整体结尾位置由fendptr指向
/* remember we have a flat representation */
eah->fvalue = array;
eah->fstartptr = ARR_DATA_PTR(array);
eah->fendptr = ((char *) array) + ARR_SIZE(array);
注意,返回的是EOH的eoh_rw_ptr指针(再复习:eoh_rw_ptr指针指向1be数据部分放了个EOH头指针)
/* return a R/W pointer to the expanded array */
return EOHPGetRWDatum(&eah->hdr);
}