一位朋友对比MySQL和Oracle提出如下问题:
蓝色字体是对问题的回答
--建表语句
create table t1(id int not null primary key auto_increment,code varchar(20),name varchar(20));
create table t2(code varchar(20) not null primary key,c_name varchar(20),remarks varchar(20));
insert into t1 values (null,'a','name1'),(null,'b','name2'),(null,'c','name3'),(null,'d','name4'),(null,'e','name5');
insert into t2 values ('a','c_name1','name1'),('b','c_name2','name3'),('c','c_nam3','name2'),('d','c_name4','name5'),('e','c_name5','name6');
--1、t1.code无索引的情况
mysql> explain extended select * from t1 inner join t2 on t1.code=t2.code;
+----+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | NULL |
| 1 | SIMPLE | t2 | ALL | PRIMARY | NULL | NULL | NULL | 5 | 80.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
回答:首先, 给出的MySQL的执行计划不正确, t2表上的主键应该被使用到
--oracle下面的执行计划
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 365 | 6 (17)| 00:00:01 |
| 1 | MERGE JOIN | | 5 | 365 | 6 (17)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| T2 | 5 | 180 | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | PK_T2 | 5 | | 1 (0)| 00:00:01 |
|* 4 | SORT JOIN | | 5 | 185 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL | T1 | 5 | 185 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
--在t1.code上无索引时,mysql执行计划是先全表扫描t1表,然后用得到的记录按照块循环嵌套算法遍历t2表查询到最终的结果集
---回答: 块,不是一次性把t1表全部扫描完毕,而是以"块"为单位,一次批量得到多个t1表的元组,拿出每一个元组然后和t2表的每个元组在连接条件满足的情况下连接.
--在oracle中却是用merge join方式实现连接。这个就是课程中讲的归并算法?
---回答: 可以这么认为. 但我没有看过Oralce的这个算法,所以只能说:原理是一样的
--两者执行计划还是相差比较大,疑问:假设t1和t2表的记录比较大,内存中没办法完全保存两表的记录,块循环嵌套算法不断循环遍历t2表的记录,这个时候效率应该会很慢吧
---回答: 效率好坏,取决于代价估算器对不同算法的代价比较,只能说Oracle经计算后认为MERGE JOIN好于Oracle提供的其它2表连接算法
--2、在t1.code上加上索引
mysql> create index idx_t1_code on t1(code);
mysql> explain extended select * from t1 inner join t2 on t1.code=t2.code;
+----+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| 1 | SIMPLE | t1 | ALL | idx_t1_code | NULL | NULL | NULL | 5 | 100.00 | NULL |
| 1 | SIMPLE | t2 | ALL | PRIMARY | NULL | NULL | NULL | 5 | 80.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
回答:给出的MySQL的执行计划不正确, t2表上的主键应该被使用到
--oracle中的执行计划:依然跟之前的一样,先两表进行排序然后在归并
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 365 | 6 (17)| 00:00:01 |
| 1 | MERGE JOIN | | 5 | 365 | 6 (17)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 5 | 185 | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | IDX_T1_CODE | 5 | | 1 (0)| 00:00:01 |
|* 4 | SORT JOIN | | 5 | 180 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL | T2 | 5 | 180 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
--3、加上where条件
--从上面可以看出mysql用的是嵌套循环连接算法?mysql> explain extended select * from t1 inner join t2 on t1.code=t2.code where t1.code='a';
+----+-------------+-------+-------+---------------+-------------+---------+-------+------+----------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+---------------+-------------+---------+-------+------+----------+-----------------------+
| 1 | SIMPLE | t2 | const | PRIMARY | PRIMARY | 62 | const | 1 | 100.00 | NULL |
| 1 | SIMPLE | t1 | ref | idx_t1_code | idx_t1_code | 63 | const | 1 | 100.00 | Using index condition |
+----+-------------+-------+-------+---------------+-------------+---------+-------+------+----------+-----------------------+
--oracle的执行计划
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 73 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 73 | 2 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 36 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PK_T2 | 1 | | 0 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 37 | 1 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | IDX_T1_CODE | 1 | | 0 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
---回答: MySQL同样也进行了等价重写.不然,t2表key列不会是"PRIMARY".
如下是我得到的执行计划:
mysql> explain select * from t1 inner join t2 on t1.code=t2.code where t1.code='a';
+----+-------------+-------+------------+-------+---------------+-------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | t2 | NULL | const | PRIMARY | PRIMARY | 22 | const | 1 | 100.00 | NULL |
| 1 | SIMPLE | t1 | NULL | ref | idx_t1_code | idx_t1_code | 23 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+-------------+---------+-------+------+----------+-------+
2 rows in set, 1 warning (0.00 sec)
---但是,这个例子,没有使用块嵌套循环连接算法(通过观测Extra列的内容,也可以知晓,因为没有"(Block Nested Loop)"这样的内容).
---观察MySQL输出的执行计划,t2表的type值是const,这表明,对于t2表,因主键可以直接得到其元组值.
---所以,只需要对t1表按照索引进行元组定位即可得到满足条件(t1.code='a')的元组,从而和t2表的唯一的一个元组直接连接.
--oracle用的是嵌套循环连接算法,oracle进行了等价重写where条件,where t1.code='a' and t2.code='a',由于t2.code为主键,能够唯一确定一条记录,这个时候先找出t2的记录在循环遍历t1表
--这条sql,oracle和mysql的执行计划是一致的,都是用的嵌套循环连接算法
--4、两个表关联条件的列都无索引
mysql> explain extended select * from t1 inner join t2 on t1.name=t2.remarks;
+----+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | NULL |
| 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
--oracle
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 365 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN | | 5 | 365 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL| T1 | 5 | 185 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T2 | 5 | 180 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
--mysql依然是用的块嵌套循环连接算法,因为mysql没有hash连接算法
--oracle使用的是hash连接算法
--当关联的两表记录都比较多的时候,mysql用块嵌套循环算法会不会比oracle用hash连接算法更慢呢?
---回答: 这个不好比较. 需要在相同的环境下比较才有意义(物理环境,操作系统,数据等都需要相同)
---但常规的认识,是hash连接算法好于块嵌套循环连接算法.