SQL SERVER JOIN OPERATOR

最新推荐文章于 2022-12-06 16:02:07 发布

dbLenis

最新推荐文章于 2022-12-06 16:02:07 发布

阅读量1.2k

点赞数

分类专栏： Performance Tuning 文章标签： join sql server table nested build processing

Performance Tuning 专栏收录该内容

14 篇文章 1 订阅

订阅专栏

面试中被问到NESTED LOOP JOIN的问题，做个回顾：

Nested Loop

The nested loop join is the original SQL Server join type. The behavior of a nested loop is to scan all the rows in one table (the outer table)

and for each row in that table, it then scans every row in the other table (the inner table). If the rows in the outer and inner tables match,

then the row is included in the results.

The performance of this join is directly proportional to the number of rows in each table. It performs well when there are relatively few rows

in one of the tables, which would be chosen as the inner table, and more rows in the other table, which would then be chosen as the

outer table. If both tables have a relatively large number of rows, then this join starts to take a very long time.

SELECT A.*,B.*

FROM A

LEFT JOIN B

ON A.AID=B.BID

假设，这里的A表是 OUTER (LOOP) TABLE，B表是INNER (LOOP) TABLE，

那么优化条件是：1.两张表量小；2.B表有BID的索引；3.B表的数据量比A表小

Merge

The merge join needs its inputs to be sorted, so ideally the tables should be indexed on the join column. Then the operator iterates through rows from

both tables at the same time, working down the rows, looking for matches. Because the inputs are ordered, this enables the join to proceed quickly,

and to end as soon as any range is satisfied.

条件：1. 2个表都有索引；2.结果需要排序

Hash

The hash join operates in two phases. During the first phase, known as the build phase, the smaller of the two tables is scanned and the rows are

placed into a hash table that is ideally stored in memory, but for very large tables, it can be written to disk. When every row in the build input table is hashed,

the second phase starts. During the second phase, known as the probe phase, rows from the larger of the two tables are compared to the contents

of the hash table, using the same hashing algorithmthat was used to create the build table hash. Any matching rows are passed to the output. The hash join

has variations on this processing that can deal with very large tables, and therefore the hash join is the join of choice for very large input tables, especially

when running on multiprocessor systems where parallel plans are allowed.

条件：1. 1小表，1超大表； 2.HASH算法，回头看数据结构与算法去吧，这里真不熟悉