PostgreSQL源码自学笔记3

最新推荐文章于 2024-05-17 19:42:26 发布

朱峥嵘（朱髯）

最新推荐文章于 2024-05-17 19:42:26 发布

阅读量123

点赞数 1

文章标签： postgresql

本文链接：https://blog.csdn.net/zzrisme/article/details/118784655

版权

PostgreSQL源码自学笔记3

今天是cost_index，也就是PG计算index scan的cost。记录几个重点：

首先是如果禁用了indexscan，照例cost加100亿，如下：

	if (!enable_indexscan)
		startup_cost += disable_cost;

其次是对index顺序是否良好的一个考虑，类似于Db2中的index cluster ratio，index字段的排序和table行的排序的一致程度用如下指标来衡量：

csquared = indexCorrelation * indexCorrelation;

indexCorrelation介于0到1之间，csquared就相当于协方差的一个指标，1表示index顺序和table完全一致，0表示完全不一致。

如果顺序完全不一致，那么disk IO cost就相当于是random IO；如果顺序完全一致，那么除了第一个page是random IO，后面n-1个page就是seq IO。如下所示：

		/* max_IO_cost is for the perfectly uncorrelated case (csquared=0) */
		max_IO_cost = pages_fetched * spc_random_page_cost;

		/* min_IO_cost is for the perfectly correlated case (csquared=1) */
		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);

		if (indexonly)
			pages_fetched = ceil(pages_fetched * (1.0 - baserel->allvisfrac));

		if (pages_fetched > 0)
		{
			min_IO_cost = spc_random_page_cost;
			if (pages_fetched > 1)
				min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
		}
		else
			min_IO_cost = 0;

这里相当于求得了disk IO cost的最大值和最小值，那么实际例子中csquared是一个介于0到1之间的数字，所以下一步就用csquared作为系数对max IO cost和min IO cost进行线性插值：

	csquared = indexCorrelation * indexCorrelation;

	run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

非常合理的想法，优美简洁的计算。

之后的cpu cost和之前seq scan并无二致，可以参考之前的笔记1。需要说的是，这里计算rows的时候只需要乘以Selectivity的，Selectivity表示记录数的过滤率，Db2中也叫做filtor factor，计算代码在selfuncs.c中，下次再分析。

朱峥嵘（朱髯）

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
PostgreSQL源码自学笔记3

PostgreSQL源码自学笔记3今天是cost_index，也就是PG计算index scan的cost。记录几个重点：首先是如果禁用了indexscan，照例cost加100亿，如下： if (!enable_indexscan) startup_cost += disable_cost;其次是对index顺序是否良好的一个考虑，类似于Db2中的index cluster ratio，如果index字段的排序和table行的排序用如下指标来衡量：csquared = indexCorre
复制链接

扫一扫