图解数据库连接查询（JOIN）的三种实现算法： MySQL、Oracle、SQL Server 等

最新推荐文章于 2025-04-07 09:00:00 发布

不剪发的Tony老师

最新推荐文章于 2025-04-07 09:00:00 发布

阅读量7.3k

点赞数 30

分类专栏：数据库文章标签： mysql oracle sqlserver 嵌套循环连接哈希连接

本文链接：https://blog.csdn.net/horses/article/details/105700677

版权

数据库专栏收录该内容

149 篇文章

订阅专栏

本文深入探讨了数据库中实现连接查询的三种核心算法：嵌套循环连接、哈希连接及排序合并连接，分析了各算法的工作原理、优缺点及适用场景。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

An SQL query walks into a bar and sees two tables. He walks up to them and asks ‘Can I join you?’
一个 SQL 查询走进酒吧看到两张桌子（table），走到它们面前问“我可以和你们坐在一起（join）吗？”😜

SQL 连接查询（JOIN）可以同时获取多个表中的关联数据。例如，查看某个完整的订单数据时，可能需要从产品表、用户表、用户订单表、以及订单明细表中获取相关的信息。

不过，今天我们要讨论的是数据库内部如何利用算法实现连接查询。通常实现连接查询的算法有三种：Nested Loop Join、Hash Join 以及 Sort Merge Join。本文涉及到的数据库包括 MySQL、Oracle、SQL Server、PostgreSQL 以及 SQLite，首先给出结论：

Join 实现算法	MySQL	Oracle	SQL Server	PostgreSQL	SQLite
Nested Loop Join	✔️	✔️	✔️	✔️	✔️
Hash Join	✔️	✔️	✔️	✔️	❌
Sort Merge Join	❌	✔️	✔️	✔️	❌

📝关于各种内、外连接的查询方式和语法可以参考这篇文章。

接下来针对三种算法进行具体的分析。

嵌套循环连接

嵌套循环连接（Nested Loop Join）是一种最基本的连接实现算法。它先从外部表（驱动表）中获取满足条件的数据，然后为每一行数据遍历一次内部表（被驱动表），获取所有匹配的数据。下图演示了嵌套循环连接的执行过程（图片来源于bertwagner）：

Nested Loop Join

Nested Loop Join 类似于编程语言中的嵌套 for 循环；当然，数据库在实现时会进行各种优化，例如通过索引提高扫描速度。

我们可以通过执行计划查看 JOIN 的实现方式，先看 MySQL 中的以下示例（示例表来自这里）：

-- MySQL
explain analyze
select e.first_name,e.last_name,e.salary,d.department_name
from employees e
join departments d on (e.department_id = d.department_id)
where d.department_name = 'IT';
  
-> Nested loop inner join  (cost=7.38 rows=24) (actual time=0.080..0.102 rows=5 loops=1)
    -> Filter: (d.department_name = 'IT')  (cost=2.95 rows=3) (actual time=0.043..0.061 rows=1 loops=1)
        -> Table scan on d  (cost=2.95 rows=27) (actual time=0.036..0.050 rows=27 loops=1)
    -> Index lookup on e using emp_department_ix (department_id=d.department_id)  (cost=1.08 rows=9) (actual time=0.035..0.038 rows=5 loops=1)

对于以上查询，MySQL 选择了使用 Nested loop inner join 算法；departments 是驱动表，循环 1 次返回 1 行数据；employees 是被驱动表，使用索引进行遍历，然后回表查找表中的数据，循环了 1 次（因为 departments 返回了 1 条记录）。实际上 MySQL 对这个嵌套循环连接进行了优化，采用的是 Index Nested Loop Join 算法，在内层循环中扫描索引 emp_department_ix 而不是数据表，从而提高效率。

📝关于各种数据库中执行计划的查看方法，可以参考这篇文章。

下面是该语句在 Oracle 中的执行计划：

-- Oracle
EXPLAIN PLAN FOR
select e.first_name,e.last_name,e.salary,d.department_name
from employees e
join departments d on (e.department_id = d.department_id)
where d.department_name = 'IT';

SELECT * FROM TABLE(DBMS_XPLAN.display);
PLAN_TABLE_OUTPUT                                                                                 |
--------------------------------------------------------------------------------------------------|
Plan hash value: 1021246405                                                                       |
                                                                                                  |
--------------------------------------------------------------------------------------------------|
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     ||
--------------------------------------------------------------------------------------------------|
|   0 | SELECT STATEMENT             |                   |    10 |   380 |     4   (0)| 00:00:01 ||
|   1 |  NESTED LOOPS                |                   |    10 |   380 |     4   (0)| 00:00:01 ||
|   2 |   NESTED LOOPS               |                   |    10 |   380 |     4   (0)| 00:00:01 ||
|*  3 |    TABLE ACCESS FULL         | DEPARTMENTS       |     1 |    16 |     3   (0)| 00:00:01 ||
|*  4 |    INDEX RANGE SCAN          | EMP_DEPARTMENT_IX |    10 |       |     0   (0)| 00:00:01 ||
|   5 |   TABLE ACCESS BY INDEX ROWID| EMPLOYEES         |    10 |   220 |     1   (0)| 00:00:01 ||
--------------------------------------------------------------------------------------------------|
                                                                                                  |
Predicate Information (identified by operation id):                                               |
---------------------------------------------------                                               |
                                                                                                  |
   3 - filter("D"."DEPARTMENT_NAME"='IT')                                                         |
   4 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")                                            |
                                                                                                  |
Note                                                                                              |
-----                                                                                             |
   - this is an adaptive plan                                                                     |

Oracle 也是选择了 departments 表作为，然后通过索引（EMP_DEPARTMENT_IX）范围扫描进行遍历找出满足连接条件的索引值和 ROWID，最后通过遍历这些索引 ROWID 获取 employees 中的数据。

SQL Server 的执行计划和 Oracle 几乎完全一致：

-- SQL Server
SET STATISTICS PROFILE ON
select e.first_name,e.last_name,e.salary,d.department_name
from employees e
join departments d on (e.department_id = d.department_id)
where d.department_name = 'IT';
SET STATISTICS PROFILE OFF

Rows|Executes|StmtText                                                                                                                                                                                             |StmtId|NodeId|Parent|PhysicalOp          |LogicalOp           |Argument                                                                                                                                                                  |DefinedValues                                  |EstimateRows|EstimateIO|EstimateCPU|AvgRowSize|TotalSubtreeCost|OutputList                                                            |Warnings|Type    |Parallel|EstimateExecutions|
----|--------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|------|------|--------------------|--------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|------------|----------|-----------|----------|----------------|----------------------------------------------------------------------|--------|--------|--------|------------------|
   5|       1|select e.first_name,e.last_name,e.salary,d.department_name¶from employees e¶join departments d on (e.department_id = d.department_id)¶where d.department_name = 'IT'                                 |     1|     1|     0|                    |                    |                                                                                                                                                                          |                                               |  8.83333302|          |           |          |      0.01430937|                                                                      |        |SELECT  |       0|                  |
   5|       1|  |--Nested Loops(Inner Join, OUTER REFERENCES:([e].[employee_id]))                                                                                                                                  |     1|     2|     1|Nested Loops        |Inner Join          |OUTER REFERENCES:([e].[employee_id])                                                                                                                                      |                                               |  8.83333302|         0| 0.00003692|        52|      0.01430937|[e].[first_name], [e].[last_name], [e].[salary], [d].[department_name]|        |PLAN_ROW|       0|                 1|
   5|       1|       |--Nested Loops(Inner Join, OUTER REFERENCES:([d].[department_id]))                                                                                                                           |     1|     3|     2|Nested Loops        |Inner Join          |OUTER REFERENCES:([d].[department_id])                                                                                                                                    |                                               |  8.83333302|         0| 0.00003692|        25|       0.0066533|[e].[employee_id], [d].[department_name]                              |        |PLAN_ROW|       0|                 1|
   1|       1|       |    |--Clustered Index Scan(OBJECT:([hrdb].[dbo].[departments].[dept_id_pk] AS [d]), WHERE:([hrdb].[dbo].[departments].[department_name] as [d].[department_name]='IT'))                     |     1|     4|     3|Clustered Index Scan|Clustered Index Scan|OBJECT:([hrdb].[dbo].[departments].[dept_id_pk] AS [d]), WHERE:([hrdb].[dbo].[departments].[department_name] as [d].[department_name]='IT')                               |[d].[department_id], [d].[department_name]     |           1|  0.003125|  0.0001867|        25|       0.0033117|[d].[department_id], [d].[department_name]                            |        |PLAN_ROW|       0|                 1|
   5|       1|       |    |--Index Seek(OBJECT:([hrdb].[dbo].[employees].[emp_department_ix] AS [e]), SEEK:([e].[department_id]=[hrdb].[dbo].[departments].[department_id] as [d].[department_id]) ORDERED FORWARD)|     1|     5|     3|Index Seek          |Index Seek          |OBJECT:([hrdb].[dbo].[employees].[emp_department_ix] AS [e]), SEEK:([e].[department_id]=[hrdb].[dbo].[departments].[department_id] as [d].[department_id]) ORDERED FORWARD|[e].[employee_id]                              |  8.83333302|  0.003125| 0.00016672|        11|      0.00329172|[e].[employee_id]                                                     |        |PLAN_ROW|       0|                 1|
   5|       5|       |--Clustered Index Seek(OBJECT:([hrdb].[dbo].[employees].[emp_emp_id_pk] AS [e]), SEEK:([e].[employee_id]=[hrdb].[dbo].[employees].[employee_id] as [e].[employee_id]) LOOKUP ORDERED FORWARD)|     1|     7|     2|Clustered Index Seek|Clustered Index Seek|OBJECT:([hrdb].[dbo].[employees].[emp_emp_id_pk] AS [e]), SEEK:([e].[employee_id]=[hrdb].[dbo].[employees].[employee_id] as [e].[employee_id]) LOOKUP ORDERED FORWARD     |[e].[first_name], [e].[last_name], [e].[salary]|           1|  0.003125|  0.0001581|        40|      0.00761915|[e].[first_name], [e].[last_name], [e].[salary]                       |        |PLAN_ROW|       0|        8.83333302|

对于该语句，PostgreSQL 与其他数据库的实现算法都不相同：

-- PostgreSQL
explain analyze
select e.first_name,e.last_name,e.salary,d.department_name
from employees e
join departments d on (e.department_id = d.department_id)
where d.department_name = 'IT';
QUERY PLAN                                                                                                        |
------------------------------------------------------------------------------------------------------------------|
Hash Join  (cost=1.35..4.75 rows=4 width=29) (actual time=0.073..0.310 rows=5 loops=1)                            |
  Hash Cond: (e.department_id = d.department_id)                                                                  |
  ->  Seq Scan on employees e  (cost=0.00..3.07 rows=107 width=22) (actual time=0.022..0.064 rows=107 loops=1)    |
  ->  Hash  (cost=1.34..1.34 rows=1 width=15) (actual time=0.032..0.032 rows=1 loops=1)                           |
        Buckets: 1024  Batches: 1  Memory Usage: 9kB                                                              |
        ->  Seq Scan on departments d  (cost=0.00..1.34 rows=1 width=15) (actual time=0.016..0.028 rows=1 loops=1)|
              Filter: ((department_name)::text = 'IT'::text)                                                      |
              Rows Removed by Filter: 26                                                                          |
Planning Time: 0.502 ms                                                                                           |
Execution Time: 0.362 ms                                                                                          |

当然，PostgreSQL 支持 Nested Loop Join，只是在这里它认为 Hash Join 是更好的实现方式。关于 Hash Join 的介绍可以参考下文。

📝如果我们使用set enable_hashjoin=off;禁用 PostgreSQL 中的哈希连接，可以看到以上示例的执行计划变成了嵌套循环连接。测试之后记得执行set enable_hashjoin=on;启用哈希连接。

最后是 SQLite 中的执行计划：

-- SQLite
explain query plan
select e.first_name,e.last_name,e.salary,d.department_name
from employees e
join departments d on (e.department_id = d.department_id)
where d.department_name = 'IT';
id|parent|notused|detail                                                                     |
--|------|-------|---------------------------------------------------------------------------|
 4|     0|      0|SCAN TABLE departments AS d                                                |
 8|     0|      0|SEARCH TABLE employees AS e USING INDEX emp_department_ix (department_id=?)|

SQLite 目前只实现了 Nested Loop Join，这里也是将 departments 选择为驱动表。

对于驱动表返回少量数据集的情况，嵌套循环连接通常可以获得很好的性能；如果被驱动表的连接字段上存在索引，性能会更好。

一般情况下，数据库可以自行判断哪个表作为驱动表；如果发现执行计划选择了错误的驱动表，首先应该考虑统计信息是否正确；许多数据库支持使用优化器提示（hint）指定连接查询中表的顺序，建议谨慎使用。

哈希连接

哈希连接（Hash Join）使用其中一个表中满足条件的记录创建哈希表，然后扫描另一个表进行匹配。哈希连接的执行过程如下图所示：

许多数据库都支持哈希连接实现，MySQL 8.0.18 也加入了哈希连接，例如：

-- MySQL
explain analyze
select e.first_name,e.last_name,e.salary,d.first_name 
from employees e
join employees d on (e.salary = d.salary);
-> Inner hash join (d.salary = e.salary)  (cost=1156.11 rows=1145) (actual time=0.582..1.006 rows=271 loops=1)
    -> Table scan on d  (cost=0.01 rows=107) (actual time=0.100..0.246 rows=107 loops=1)
    -> Hash
        -> Table scan on e  (cost=10.95 rows=107) (actual time=0.199..0.271 rows=107 loops=1)

在上面的查询中，我们使用 salary 字段连接两个 employees 表；由于该字段没有索引，MySQL 选择了 Inner hash join。通常来说，优化器会选择两者中的小表或者数据源建立哈希表。

对于上面的示例，Oracle、SQL Server 以及 PostgreSQL 都选择了哈希连接的方式；SQLite 不支持哈希连接，仍然使用嵌套循环连接。

哈希连接是执行大数据集连接时的常用方式，但是它不支持范围连接条件（t1.col < t2.col1）。对于哈希连接而言，不需要基于连接字段创建索引，因为它不会利用索引进行连接。当然，为WHERE条件中的字段创建索引总是可以优化性能。

哈希连接使用内存构建哈希表，但是如果数据量太大，需要使用磁盘临时表。在SELECT中选择更少的字段也可以提高哈希连接的性能，因为哈希表中存储了所有需要的字段。

排序合并连接

排序合并连接（Sort Merge Join）先将两个数据源按照连接字段进行排序（Sort），然后合并两个已经排序的集合，返回满足连接条件的结果。排序合并连接的执行过程如下图所示：

merge join
以下是 Oracle 中的一个排序合并连接的示例：

-- Oracle
EXPLAIN PLAN FOR
select e.first_name,e.last_name,e.salary,d.department_name
from employees e
join departments d on (e.department_id = d.department_id);

SELECT * FROM TABLE(DBMS_XPLAN.display);
PLAN_TABLE_OUTPUT                                                                           |
--------------------------------------------------------------------------------------------|
Plan hash value: 1343509718                                                                 |
                                                                                            |
--------------------------------------------------------------------------------------------|
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     ||
--------------------------------------------------------------------------------------------|
|   0 | SELECT STATEMENT             |             |   106 |  4028 |     6  (17)| 00:00:01 ||
|   1 |  MERGE JOIN                  |             |   106 |  4028 |     6  (17)| 00:00:01 ||
|   2 |   TABLE ACCESS BY INDEX ROWID| DEPARTMENTS |    27 |   432 |     2   (0)| 00:00:01 ||
|   3 |    INDEX FULL SCAN           | DEPT_ID_PK  |    27 |       |     1   (0)| 00:00:01 ||
|*  4 |   SORT JOIN                  |             |   107 |  2354 |     4  (25)| 00:00:01 ||
|   5 |    TABLE ACCESS FULL         | EMPLOYEES   |   107 |  2354 |     3   (0)| 00:00:01 ||
--------------------------------------------------------------------------------------------|
                                                                                            |
Predicate Information (identified by operation id):                                         |
---------------------------------------------------                                         |
                                                                                            |
   4 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")                                      |
       filter("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")                                      |

查询首先按照索引 DEPT_ID_PK 的顺序获取 departments表中的数据，同时扫描 employees 表并且按照 department_id 列排序；然后依次比较合并这两个数据集。

对于以上语句，并不是所有数据库都会选择排序合并连接；MySQL 和 SQLite 没有实现排序合并连接，选择的是嵌套循环连接；SQL Server 也选择了嵌套循环连接，可以使用inner merge join强制使用排序合并连接；PostgreSQL 使用了哈希连接，可以使用set enable_hashjoin=off;禁用哈希连接，此时将会使用排序合并连接。

排序合并连接一般用在两张表中没有索引，并且数据已经排好序的情况。虽然这种方式执行速度很快，但大数情况下数据没有排序，因此性能不如哈希连接。