【源码解读】CockroachDB Hash Join算子逻辑解读

随处可见的打字员

已于 2022-03-21 18:44:40 修改

阅读量628

点赞数 2

分类专栏：日志文章标签：哈希算法数据库 go语言

于 2021-12-10 12:03:41 首次发布

本文链接：https://blog.csdn.net/qq_40948559/article/details/121851725

版权

日志专栏收录该内容

2 篇文章 0 订阅

订阅专栏

关于CockroachDB

CockroachDB 是一个分布式关系型数据库，主要设计目标是可扩展，强一致和高可靠。CockroachDB 旨在无人为干预情况下，以极短的中断时间容忍磁盘、主机、机架甚至整个数据中心的故障。 CockroachDB 采用完全去中心化架构，集群中各个节点的地位完全对等，同时所有功能封装在一个二进制文件中，可以做到尽量不依赖配置文件直接部署。

关于Hash Join

所谓 Hash Join 就是在表的 join 时候选择一张表作为 buildSide 表来构造哈希表，另外一张表作为 probeSide 表，然后对 probeSide 表的每一行数据都去这个哈希表中查找是否有匹配的数据（其中 buildSide 表通常是小表，probeSide 表通常是较大的表）。

源码解读

本文解析源码位于cockroach/pkg/sql/opt/xform/coster.go

cockroach/coster.go at master · cockroachdb/cockroach · GitHubhttps://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/xform/coster.gocomputeHashJoinCost函数，大致位于779-837行

算子逻辑

Hash Join一般用于连接中，左右表数据量较大并且左表行数明显大于右表。连接过程中 Hash Join 为小表建立 hash table，读取大表的每一行，计算关联列的 hash key，通过 hash map 映射到对应元组。

代价计算涉及常数因子及辅助函数

常数因子

hugeCost：表示的是通常大于开销模型估计结果的一个代价值，hugeCost=1e100
cpuCostFactor：从 PostgreSQL 中的 DEFAULT_CPU_TUPLE_COST 借鉴而来，cpuCostFactor=0.01

辅助函数

computeFiltersCost：返回执行过滤器的设置和每行成本。这个函数的调用者应该添加 setupCost，并将 perRowCost 乘以预期要过滤的行数。

setupCost += cpuCostFactor
perRowCost += cpuCostFactor

此处是为了添加一个基本的 perRowCost，这样调用者就不需要有自己的每行基本成本

算子开销计算公式及流程

计算公式

Cost = 1.25 * leftRowCount + 1.75 * rightRowCount * cpuCostFactor + filterSetup + rowsProcessed * filterPerRow

其中 1.25 是左边计算 hashkey 的开销，1.75 是建立 hashtable 的开销

具体计算流程

首先判断 hint 是否禁止 HashJoin，禁止则返回 hugeCost，否则进入下一步。
得到leftRowCount 和 rightRowCount。
读表代价： cost := memo.Cost(1.25*leftRowCount+1.75*rightRowCount) * cpuCostFactor。
计算 filter cost。FiltersItem 筛选由包含 Select 或 Join 操作符的行，只有当所有条件都为 true 时，才过滤一行。如果该集合为空，则它永远不会过滤行；func ExtractJoinEqualityColumns 提取连接相等列，返回一对列（一个来自左边，一个来自右边），它们在连接中被约束为相等（并且具有等效类型）。
eqMap := util.FastIntMap{}定义相等过滤器。FastIntMap 当键和值都很小时，它更有效，它可以通过值传递。
估算重新计算的代价：cost += filterSetup。
计算两表连接后最终需要总行数的代价 cost+= memo.Cost(rowsProcessed) * filterPerRow，最后返回 cost 值。

实例测试

一般测试

SQL语句

SELECT * 
FROM customer 
JOIN nation 
ON (c_nationkey = n_nationkey) ;

表的情况

Input	Left Table: customer	Left row count=150000
Input	Right Table: nation	Right row count=25

实验结果

读表代价	filter cost	连接后的最终代价
leftRowCount = 150000	filterPerRow = 0.01	cost+=memo.Cost(rowsProcessed)*filterPerRow
rightRowCount = 25	filterSetup = 0.01	cost+=memo.Cost(rowsProcessed)*filterPerRow
cost = 1875.4375	cost =1875.4475	cost = 3375.4475
Output: cost=3375.4475

Hash Join、Merge Join、Lookup Join 实验对比：

SQL语句

SELECT * 
FROM customer 
JOIN nation 
ON (c_nationkey = n_nationkey)
WHERE c_custkey > 100000;

表的情况

Input	Left Table: customer	Left row count=150000
Input	Right Table: nation	Right row count=25

实验结果

Hash Join ComputeCost	Merge Join ComputeCost	Lookup Join ComputeCost
cost=1161.2729+0.01+6 0914... = 62075...	cost=1032.0948+0.01+780 99... = 79131...	cost=363914...

省略号...指小数点后的值

实验结论

表中 Lookup Join ComputeCost 明显远大于 Hash Join 和 Merge Join；
算子方法返回 cost：computeHashJoinCost(cost ≈ 1161)> computeMergeJoinCost(cost ≈ 1032)，这是因为 Merge Join 在 join 的列无序的情况下最后计算所得的 ComputeCost 还需要加上 sort 的 cost，使得 Merge Join 的 cost 大于 Hash Join；
实际执行计划情况对比：

从上图可以看出 Merge Join 需要对无序列排序；
在实际执行计划的选择中，Merge Join 需要 join 的列排序 cost 不高的情况，当列无序并且左表行数明显大于右表的时候执行计划会倾向选择 Hash Join。

附录：

func (c *coster) computeHashJoinCost(join memo.RelExpr) memo.Cost {
	if join.Private().(*memo.JoinPrivate).Flags.Has(memo.DisallowHashJoinStoreRight) {
		return hugeCost
	}
	leftRowCount := join.Child(0).(memo.RelExpr).Relational().Stats.RowCount
	rightRowCount := join.Child(1).(memo.RelExpr).Relational().Stats.RowCount
	if (join.Op() == opt.SemiJoinOp || join.Op() == opt.AntiJoinOp) && leftRowCount < rightRowCount {
		// If we have a semi or an anti join, during the execbuilding we choose
		// the relation with smaller cardinality to be on the right side, so we
		// need to swap row counts accordingly.
		// TODO(raduberinde): we might also need to look at memo.JoinFlags when
		// choosing a side.
		leftRowCount, rightRowCount = rightRowCount, leftRowCount
	}

	// A hash join must process every row from both tables once.
	//
	// We add some factors to account for the hashtable build and lookups. The
	// right side is the one stored in the hashtable, so we use a larger factor
	// for that side. This ensures that a join with the smaller right side is
	// preferred to the symmetric join.
	cost := memo.Cost(1.25*leftRowCount+1.75*rightRowCount) * cpuCostFactor

	// Add a cost for buffering rows that takes into account increased memory
	// pressure and the possibility of spilling to disk.
	cost += c.rowBufferCost(rightRowCount)

	// Compute filter cost. Fetch the equality columns so they can be
	// ignored later.
	on := join.Child(2).(*memo.FiltersExpr)
	leftEq, rightEq := memo.ExtractJoinEqualityColumns(
		join.Child(0).(memo.RelExpr).Relational().OutputCols,
		join.Child(1).(memo.RelExpr).Relational().OutputCols,
		*on,
	)
	// Generate a quick way to lookup if two columns are join equality
	// columns. We add in both directions because we don't know which way
	// the equality filters will be defined.
	eqMap := util.FastIntMap{}
	for i := range leftEq {
		left := int(leftEq[i])
		right := int(rightEq[i])
		eqMap.Set(left, right)
		eqMap.Set(right, left)
	}
	filterSetup, filterPerRow := c.computeFiltersCost(*on, eqMap)
	cost += filterSetup

	// Add the CPU cost of emitting the rows.
	rowsProcessed, ok := c.mem.RowsProcessed(join)
	if !ok {
		// This can happen as part of testing. In this case just return the number
		// of rows.
		rowsProcessed = join.Relational().Stats.RowCount
	}
	cost += memo.Cost(rowsProcessed) * filterPerRow

	return cost
}