《一道SQL考题的思考》这篇文章,引起了很多朋友和前辈的疑问和建议,很受用,所以还是有必要补充些内容。
问题1,方案1执行update,select的结果应该是(1e,2b),存在不匹配的记录,不会进行更新,是我贴错了,我的锅,
SQL> update testupdate1
set name=(select name from testupdate2
where testupdate1.id=testupdate2.id)
where testupdate1.id in (select id from testupdate2);
1 row updated.
SQL> select * from testupdate1;
ID NAME
---------- ----------
1 e
2 此处不为空,而是原值b
问题2,方案3的“and testupdate1.id=1”可以删除,当时为了测试,粘图的时候,选择错了,我的锅,
SQL> update
(select testupdate1.name t1name, testupdate2.name t2name from testupdate1, testupdate2
where testupdate1.id=testupdate2.id
<删除and testupdate1.id=1>) t set t1name=t2name;
1 row updated.
SQL> select * from testupdate1;
ID NAME
---------- ----------
1 e
2 b
问题3,update t1,t2 set t1.name=t2.name where t1.id=t2.id;行不行?
Oracle中update语法格式如下,
UPDATE [ hint ]
{ dml_table_expression_clause
| ONLY (dml_table_expression_clause)
} [ t_alias ]
update_set_clause
[ where_clause ]
[ returning_clause ]
[error_logging_clause] ;
update跟着两张表,会提示错误,显然不支持这种操作,
SQL> update t1,t2 set t1.name=t2.name where t1.id=t2.id;
update t1,t2 set t1.name=t2.name where t1.id=t2.id
*
ERROR at line 1:
ORA-00971: missing SET keyword
MySQL中,文档中提到了Multiple-table,update语法格式如下,
他是支持这种跟着多张表的格式,按照where条件,更新正确的表,
For the multiple-table syntax, UPDATE updates rows in each table named in table_references that satisfy the conditions. Each matching row is updated once, even if it matches the conditions multiple times. For multiple-table syntax, ORDER BY and LIMIT cannot be used.
这是一位读者,发给我的截图,从操作层面,论证了这个问题,
SQL> update testupdate1 t1, testupdate2 t2
set t1.name = t2.name
where t1.id = t2.id;
共1行受到影响
问题4,TigerLiu老师提出来,“说SQL写法一般要提及性能。这个SQL最后还只是停留在功能上,没有性能优劣的说明”,很有道理,所以我们比较下这几个方案对应的执行计划,看能不能从中看出一些性能方面的问题。
前提:每张表存在id字段的主键索引。
方案1,
SQL> update testupdate1
set name=(select name from testupdate2
where testupdate1.id=testupdate2.id)
where testupdate1.id in (select id from testupdate2);
1 row updated.
当两张表只有少量数据时他的执行计划,如下所示,
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 2 | 66 | 7 (29)| 00:00:01 |
| 1 | UPDATE | TESTUPDATE1 | | | | |
| 2 | NESTED LOOPS | | 2 | 66 | 3 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | PK_TESTUPDATE2 | 3 | 39 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | PK_TESTUPDATE1 | 1 | 20 | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| TESTUPDATE2 | 1 | 20 | 1 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | PK_TESTUPDATE2 | 1 | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("TESTUPDATE1"."ID"="ID")
6 - access("TESTUPDATE2"."ID"=:B1)
当两张表各存在100万条数据时,匹配更新100万条数据,
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1108K| 34M| 2217K (50)| 07:23:34 |
| 1 | UPDATE | TESTUPDATE1 | | | | |
| 2 | NESTED LOOPS | | 1108K| 34M| 687 (11)| 00:00:09 |
| 3 | TABLE ACCESS FULL | TESTUPDATE1 | 1124K| 21M| 625 (2)| 00:00:08 |
|* 4 | INDEX UNIQUE SCAN | PK_TESTUPDATE2 | 1 | 13 | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| TESTUPDATE2 | 1 | 20 | 1 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | PK_TESTUPDATE2 | 1 | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("TESTUPDATE1"."ID"="ID")
6 - access("TESTUPDATE2"."ID"=:B1)
当两张表各存在100万条数据时,匹配更新1条数据,
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1107K| 34M| 2215K (50)| 07:23:12 |
| 1 | UPDATE | TESTUPDATE1 | | | | |
| 2 | NESTED LOOPS | | 1107K| 34M| 687 (11)| 00:00:09 |
| 3 | TABLE ACCESS FULL | TESTUPDATE1 | 1124K| 21M| 625 (2)| 00:00:08 |
|* 4 | INDEX UNIQUE SCAN | PK_TESTUPDATE2 | 1 | 13 | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| TESTUPDATE2 | 1 | 20 | 1 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | PK_TESTUPDATE2 | 1 | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("TESTUPDATE1"."ID"="ID")
6 - access("TESTUPDATE2"."ID"=:B1)
方案2,
SQL> update testupdate1
set name=(select name from testupdate2
where testupdate1.id=testupdate2.id)
where exists (select 1 from testupdate2
where testupdate2.id=testupdate1.id);
1 row updated.
当两张表只有少量数据时,他的执行计划,如下所示,
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 2 | 66 | 7 (29)| 00:00:01 |
| 1 | UPDATE | TESTUPDATE1 | | | | |
| 2 | NESTED LOOPS SEMI | | 2 | 66 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TESTUPDATE1 | 2 | 40 | 3 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | PK_TESTUPDATE2 | 3 | 39 | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| TESTUPDATE2 | 1 | 20 | 1 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | PK_TESTUPDATE2 | 1 | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("TESTUPDATE2"."ID"="TESTUPDATE1"."ID")
6 - access("TESTUPDATE2"."ID"=:B1)
当两张表各存在100万条数据时,匹配更新100万条数据,
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1108K| 34M| 2217K (50)| 07:23:34 |
| 1 | UPDATE | TESTUPDATE1 | | | | |
| 2 | NESTED LOOPS SEMI | | 1108K| 34M| 687 (11)| 00:00:09 |
| 3 | TABLE ACCESS FULL | TESTUPDATE1 | 1124K| 21M| 625 (2)| 00:00:08 |
|* 4 | INDEX UNIQUE SCAN | PK_TESTUPDATE2 | 1092K| 13M| 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| TESTUPDATE2 | 1 | 20 | 1 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | PK_TESTUPDATE2 | 1 | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("TESTUPDATE2"."ID"="TESTUPDATE1"."ID")
6 - access("TESTUPDATE2"."ID"=:B1)
当两张表各存在100万条数据时,匹配更新1条数据,
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1107K| 34M| 2215K (50)| 07:23:11 |
| 1 | UPDATE | TESTUPDATE1 | | | | |
| 2 | NESTED LOOPS SEMI | | 1107K| 34M| 687 (11)| 00:00:09 |
| 3 | TABLE ACCESS FULL | TESTUPDATE1 | 1124K| 21M| 625 (2)| 00:00:08 |
|* 4 | INDEX UNIQUE SCAN | PK_TESTUPDATE2 | 1090K| 13M| 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| TESTUPDATE2 | 1 | 20 | 1 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | PK_TESTUPDATE2 | 1 | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("TESTUPDATE2"."ID"="TESTUPDATE1"."ID")
6 - access("TESTUPDATE2"."ID"=:B1)
我们看这两个SQL对应的执行计划,有几个小区别,
1.当两张表只有少量数据时,方案1在第一个子查询中对TESTUPDATE1的主键索引执行索引唯一扫描,对TESTUPDATE2的主键索引执行索引全扫描,方案2在第一个子查询中对TESTUPDATE1执行全表扫描,对TESTUPDATE2的主键索引执行索引唯一扫描。
2.当两张表只有少量数据时,方案1对第一个子查询的两个结果集执行的是嵌套循环连接,而方案2使用的嵌套循环半连接,前者会遍历结果集中的每条记录,而后者一旦第一条结果出来,就停止继续进行执行。
3.当两张表各存在100万条数据时,无论匹配100万条数据的更新,还是匹配1条数据的更新,这两个方案的执行计划都是相同的,而且对TESTUPDATE1表执行全表扫描,所以消耗很大。
因此,当两张表只有少量数据时,方案1和2在性能上并无差别,当两张表各存在100万条数据时,对TESTUPDATE1的全表扫描,就可能成为性能“木桶”的短板。
方案3,
SQL> update (select testupdate1.name t1name, testupdate2.name t2name from testupdate1, testupdate2 where testupdate1.id=testupdate2.id) t set t1name=t2name;
1 row updated.
当两张表只有少量数据时,他的执行计划,如下所示,
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 2 | 80 | 5 (0)| 00:00:01 |
| 1 | UPDATE | TESTUPDATE1 | | | | |
| 2 | NESTED LOOPS | | 2 | 80 | 5 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 2 | 80 | 5 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | TESTUPDATE1 | 2 | 40 | 3 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | PK_TESTUPDATE2 | 1 | | 0 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID | TESTUPDATE2 | 1 | 20 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("TESTUPDATE1"."ID"="TESTUPDATE2"."ID")
当两张表各存在100万条数据时,匹配更新100万条数据,
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1108K| 42M| | 4650 (1)| 00:00:56 |
| 1 | UPDATE | TESTUPDATE1 | | | | | |
|* 2 | HASH JOIN | | 1108K| 42M| 33M| 4650 (1)| 00:00:56 |
| 3 | TABLE ACCESS FULL| TESTUPDATE2 | 1108K| 21M| | 625 (2)| 00:00:08 |
| 4 | TABLE ACCESS FULL| TESTUPDATE1 | 1124K| 21M| | 625 (2)| 00:00:08 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TESTUPDATE1"."ID"="TESTUPDATE2"."ID")
当两张表各存在100万条数据时,匹配更新1条数据,
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1107K| 42M| | 4649 (1)| 00:00:56 |
| 1 | UPDATE | TESTUPDATE1 | | | | | |
|* 2 | HASH JOIN | | 1107K| 42M| 33M| 4649 (1)| 00:00:56 |
| 3 | TABLE ACCESS FULL| TESTUPDATE2 | 1107K| 21M| | 625 (2)| 00:00:08 |
| 4 | TABLE ACCESS FULL| TESTUPDATE1 | 1124K| 21M| | 625 (2)| 00:00:08 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TESTUPDATE1"."ID"="TESTUPDATE2"."ID")
这种内联视图的方式,和方案2很像,但是当两张表只有少量数据时,方案2多了一次对TESTUPDATE2主键索引的索引唯一扫描,方案3无需回表,当两张表各存在100万条数据时,无论匹配100万条数据的更新,还是匹配1条数据的更新,他们的执行计划都是相同的,都需要对两张表执行全表扫描,再做哈希连接,相比方案1和2的嵌套循环连接,显然性能上更好。
方案4,
SQL> merge into testupdate1 t1
using (select testupdate2.name t2name, testupdate2.id t2id from testupdate2) t2
on (t2.t2id=t1.id)
when matched then
update set t1.name=t2.t2name;
1 row merged.
其实这块能简写,因为TESTUPDATE2只是id和name字段,所以直接使用using testupdate2,不用再写select,
SQL> merge into testupdate1 t1
using testupdate2 t2
on (t2.id=t1.id)
when matched then
update set t1.name=t2.name;
1 row merged.
当两张表只有少量数据时,他的执行计划,如下所示,
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 2 | 28 | 5 (0)| 00:00:01 |
| 1 | MERGE | TESTUPDATE1 | | | | |
| 2 | VIEW | | | | | |
| 3 | NESTED LOOPS | | 2 | 104 | 5 (0)| 00:00:01 |
| 4 | NESTED LOOPS | | 2 | 104 | 5 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TESTUPDATE1 | 2 | 64 | 3 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | PK_TESTUPDATE2 | 1 | | 0 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID| TESTUPDATE2 | 1 | 20 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("TESTUPDATE2"."ID"="T1"."ID")
当两张表各存在100万条数据时,匹配更新100万条数据,
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1108K| 14M| | 5288 (1)| 00:01:04 |
| 1 | MERGE | TESTUPDATE1 | | | | | |
| 2 | VIEW | | | | | | |
|* 3 | HASH JOIN | | 1108K| 54M| 33M| 5288 (1)| 00:01:04 |
| 4 | TABLE ACCESS FULL| TESTUPDATE2 | 1108K| 21M| | 625 (2)| 00:00:08 |
| 5 | TABLE ACCESS FULL| TESTUPDATE1 | 1124K| 34M| | 625 (2)| 00:00:08 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("TESTUPDATE2"."ID"="T1"."ID")
当两张表各存在100万条数据时,匹配更新1条数据,
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1107K| 14M| | 5286 (1)| 00:01:04 |
| 1 | MERGE | TESTUPDATE1 | | | | | |
| 2 | VIEW | | | | | | |
|* 3 | HASH JOIN | | 1107K| 54M| 33M| 5286 (1)| 00:01:04 |
| 4 | TABLE ACCESS FULL| TESTUPDATE2 | 1107K| 21M| | 625 (2)| 00:00:08 |
| 5 | TABLE ACCESS FULL| TESTUPDATE1 | 1124K| 34M| | 625 (2)| 00:00:08 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("TESTUPDATE2"."ID"="T1"."ID")
可以看到,操作表的关键步骤和方案3相同,但是方案4的merge写法稍微复杂,优点则是merge操作还可以支持insert,在功能上,更丰富些。
根据以上实验,
1.从性能考量,如果两表数据量较小(几条、几十条),这4个方案,并无明显差别,方案1和2其实是同一种写法,如果数据量很大(实验中的100万),推荐方案3和4,但是3需要创建唯一索引(或者主键索引),否则会抛异常。
2.从SQL复杂度考量,方案3比方案4简单,还是刚才说的,方案3需要前提(唯一索引/主键索引)。
3.从功能考量,方案4还可以支持insert,比其他方案功能更强。
因此,方案4可能更推荐。
另外,TigerLiu老师改进了写法,加了判断条件,如下所示,目的是当源和目标的数据如果相同,则不做修改。如果不加这个条件,在MySQL里面是自动不做修改的,Oracle则不管,毕竟,少做事,就是一种优化。
从这次的案例,至少让我学到了,对待任何发表的文字,一定要严谨,贴上来的实验、数据,要准确,避免误导读者,同时,一条SQL的质量高低,不仅要满足功能的需求,而且要能提供良好的性能,缺一不可。在这个论证过程当中,TigerLiu老师探究问题根源的严谨性和SQL精通程度着实让我敬佩,从这些前辈身上,看到了自身差距,受益匪浅,还是有太多需要学习的。
碰到问题的时候,还是要多问一句为什么,往往就可以了解到很多深层次的隐藏知识,毕竟不像在学校,老师能告诉你,能为你指点方向,在职场上,有些问题只能自己挖掘,才可以让自己向强者更进一步,和读者们共勉。