面试中被问到NESTED LOOP JOIN的问题,做个回顾:
Nested Loop
The nested loop join is the original SQL Server join type. The behavior of a nested loop is to scan all the rows in one table (the outer table)
and for each row in that table, it then scans every row in the other table (the inner table). If the rows in the outer and inner tables match,
then the row is included in the results.
The performance of this join is directly proportional to the number of rows in each table. It performs well when there are relatively few rows
in one of the tables, which would be chosen as the inner table, and more rows in the other table, which would then be chosen as the
outer table. If both tables have a relatively large number of rows, then this join starts to take a very long time.
SELECT A.*,B.*
FROM A
LEFT JOIN B
ON A.AID=B.BID
假设,这里的A表是 OUTER (LOOP) TABLE,B表是INNER (LOOP) TABLE,
那么优化条件是:1.两张表量小;2.B表有BID的索引;3.B表的数据量比A表小
Merge
The merge join needs its inputs to be sorted, so ideally the tables should be indexed on the join column. Then the operator iterates through rows from
both tables at the same time, working down the rows, looking for matches. Because the inputs are ordered, this enables the join to proceed quickly,
and to end as soon as any range is satisfied.
条件:1. 2个表都有索引;2.结果需要排序
Hash
The hash join operates in two phases. During the first phase, known as the build phase, the smaller of the two tables is scanned and the rows are
placed into a hash table that is ideally stored in memory, but for very large tables, it can be written to disk. When every row in the build input table is hashed,
the second phase starts. During the second phase, known as the probe phase, rows from the larger of the two tables are compared to the contents
of the hash table, using the same hashing algorithmthat was used to create the build table hash. Any matching rows are passed to the output. The hash join
has variations on this processing that can deal with very large tables, and therefore the hash join is the join of choice for very large input tables, especially
when running on multiprocessor systems where parallel plans are allowed.
条件:1. 1小表,1超大表; 2.HASH算法,回头看数据结构与算法去吧,这里真不熟悉