SQL Server 执行连接的方式 - Hash Join

最新推荐文章于 2022-11-11 09:50:17 发布

Warren

最新推荐文章于 2022-11-11 09:50:17 发布

阅读量1k

点赞数 1

分类专栏： SQL Server 文章标签： Hash Join

SQL Server 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

The hash join has two inputs: the build input and probe input. The query optimizer assigns these roles so that the smaller of the two inputs is the build input.

Hash joins are used for many types of set-matching operations: inner join; left, right, and full outer join; left and right semi-join; intersection; union; and difference. Moreover, a variant of the hash join can do duplicate removal and grouping, such as SUM(salary) GROUP BY department. These modifications use only one input for both the build and probe roles.

The following sections describe different types of hash joins: in-memory hash join, grace hash join, and recursive hash join.

In-Memory Hash Join

The hash join first scans or computes the entire build input and then builds a hash table in memory. Each row is inserted into a hash bucket depending on the hash value computed for the hash key. If the entire build input is smaller than the available memory, all rows can be inserted into the hash table. This build phase is followed by the probe phase. The entire probe input is scanned or computed one row at a time, and for each probe row, the hash key's value is computed, the corresponding hash bucket is scanned, and the matches are produced.

Grace Hash Join

If the build input does not fit in memory, a hash join proceeds in several steps. This is known as a grace hash join. Each step has a build phase and probe phase. Initially, the entire build and probe inputs are consumed and partitioned (using a hash function on the hash keys) into multiple files. Using the hash function on the hash keys guarantees that any two joining records must be in the same pair of files. Therefore, the task of joining two large inputs has been reduced to multiple, but smaller, instances of the same tasks. The hash join is then applied to each pair of partitioned files.

Recursive Hash Join

If the build input is so large that inputs for a standard external merge would require multiple merge levels, multiple partitioning steps and multiple partitioning levels are required. If only some of the partitions are large, additional partitioning steps are used for only those specific partitions. In order to make all partitioning steps as fast as possible, large, asynchronous I/O operations are used so that a single thread can keep multiple disk drives busy.

It is not always possible during optimization to determine which hash join is used. Therefore, SQL Server starts by using an in-memory hash join and gradually transitions to grace hash join, and recursive hash join, depending on the size of the build input.

If the optimizer anticipates wrongly which of the two inputs is smaller and, therefore, should have been the build input, the build and probe roles are reversed dynamically. The hash join makes sure that it uses the smaller overflow file as build input. This technique is called role reversal. Role reversal occurs inside the hash join after at least one spill to the disk.

Hash Bailout

The term hash bailout is sometimes used to describe grace hash joins or recursive hash joins.

http://technet.microsoft.com/en-us/library/ms189313(v=sql.105).aspx

Warren

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SQL Server 执行连接的方式 - Hash Join

The hash join has two inputs: the build input and probe input. The query optimizer assigns these roles so that the smaller of the two inputs is the build input.Hash joins are used for many types of
复制链接

扫一扫