选择组合索引的前导列,必须根据具体的业务(SQL)写法和列的数据分布不同而不同,很多书或网上都说,前导列要选择高选择性的,但是,脱离具体的业务,这些是没有意义的,本文就举一些常见的例子来分析下如何正确选择前导列,以抛砖引玉,实际应用中,有更多复杂的情况需要具体分析。
1.都是等值条件的列,谁做前导列都一样
owner有33个不同的值,object_id有75250,显然object_id的选择性更好。但是下面的查询,应用idx1_t与idx2_t的性能一样(COST与CONSISTENT GETS一样)。
虽然如此,但是要记住,这个索引不是仅仅给这2条SQL使用的,事实上可能我们有的查询谓词只有owner或object_id,这时候得考虑使用owner作为前导列还是使用object_id作为前导列。
还有其他引用owner,object_id的情况,比如GROUP BY ,ORDER BY,甚至SELECT...都需要进行整体的分析,这样才能建立最佳的索引。
2.有的列是大于(等于)或小于(等于)或者是like 模糊匹配等不等条件,有的列是等值的条件,等值的一般作为前导列更好
--做5次,增加几十万行SYS的进去
INSERT INTO t SELECT * FROM t WHERE owner='SYS';
COMMIT;
--重新收集统计信息(省略)
上面的SQL走idx1_t,注意观察谓词,只有access,说明索引完全被利用上,很显然因为owner是前导列,而且是等值查询,按照前导列查询,然后只要分析索引的第2列object_id,当发现不满足条件object_id>=107889之后就停止了,索引扫描没有浪费。
强制使用idx2_t,object_id是前导列,谓词有access,还有filter,说明索引没有被完全利用上,这是因为object_id的不是等值查询,满足object_id>=107889的,按照顺序搜索
所以,中间可能有一些不满足owner='DINGJUN123'的,还要filter掉。
这种查询和不等值条件作为前导列的查询,一旦object_id>=107889不满足owner='DINGJUN123'的很多,那么必然造成过多不必要的索引搜索,COST与逻辑读会上升很快,
从而性能急剧下降, 因为本例子基本都满足owner条件,所以没有啥浪费。但是下面的例子:
还是使用idx1_t,没有问题。看下面的强制使用idx2_t的。
SELECT COUNT(*) FROM t WHERE object_id >= 100;
返回1032649行,但是WHERE owner='DINGJUN123' AND object_id>=100 只返回2540行,要filter掉百万行,轮询索引,造成了极大的浪费。
3.如果都是比较,都是<,>之类的表达式
这种情况,前导列,根据谓词,选择条件能够定位最接近处理结果的基数,并能够减少索引后filter的工作,因为必然有一列是要走access之后的filter,最好是filter能够过滤较少数据,不要做过多过滤。
例如:
关闭index SKIP SCAN,因为owner种类很少,oracle选择skip SCAN
alter session set "_optimizer_skip_scan_enabled" = false;
因为owner>='DINGJUN123'返回大量行,但是事实结果很少,只有几十行,过滤object_id>=107872,需要做大量工作,逻辑读和COST增大千倍+,性能低下。
后记:
当然如何选择前导列的顺序很复杂,得全盘考虑对应的谓词,SELECT的列等要素,还要考虑ORDER BY ,GROUP BY等列,比如3列组合索引,如何考虑顺序。
后续会补充更多的组合索引如何创建的要点。
1.都是等值条件的列,谁做前导列都一样
DROP TABLE t; CREATE TABLE t AS SELECT * FROM dba_objects; CREATE INDEX idx1_t ON t(owner,object_id); CREATE INDEX idx2_t ON t(object_id,owner); BEGIN dbms_stats.gather_table_stats(ownname => USER,tabname => 'T',estimate_percent => 100,cascade => TRUE); END; / dingjun123@ORADB> SELECT COUNT(DISTINCT owner),COUNT(DISTINCT object_id),COUNT(*) FROM t; COUNT(DISTINCTOWNER) COUNT(DISTINCTOBJECT_ID) COUNT(*) -------------------- ------------------------ ---------- 33 75250 75251 1 row selected. |
owner有33个不同的值,object_id有75250,显然object_id的选择性更好。但是下面的查询,应用idx1_t与idx2_t的性能一样(COST与CONSISTENT GETS一样)。
dingjun123@ORADB> SELECT/*+index(t idx1_t)*/ * FROM t 2 WHERE owner='DINGJUN123' AND object_id=75677; 1 row selected. Elapsed: 00:00:00.00 Execution Plan ---------------------------------------------------------- Plan hash value: 2071967826 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 97 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 97 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX1_T | 1 | | 1 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("OWNER"='DINGJUN123' AND "OBJECT_ID"=75677) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 1403 bytes sent via SQL*Net to client 416 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed dingjun123@ORADB> SELECT/*+index(t idx2_t)*/ * FROM t 2 WHERE owner='DINGJUN123' AND object_id=75677; 1 row selected. Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 3787301248 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 97 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 97 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX2_T | 1 | | 1 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("OBJECT_ID"=75677 AND "OWNER"='DINGJUN123') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 1403 bytes sent via SQL*Net to client 416 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed |
虽然如此,但是要记住,这个索引不是仅仅给这2条SQL使用的,事实上可能我们有的查询谓词只有owner或object_id,这时候得考虑使用owner作为前导列还是使用object_id作为前导列。
还有其他引用owner,object_id的情况,比如GROUP BY ,ORDER BY,甚至SELECT...都需要进行整体的分析,这样才能建立最佳的索引。
2.有的列是大于(等于)或小于(等于)或者是like 模糊匹配等不等条件,有的列是等值的条件,等值的一般作为前导列更好
--做5次,增加几十万行SYS的进去
INSERT INTO t SELECT * FROM t WHERE owner='SYS';
COMMIT;
--重新收集统计信息(省略)
dingjun123@ORADB> SELECT * FROM t 2 WHERE owner='DINGJUN123' 3 AND object_id>=107889; 1 row selected. Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 2071967826 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 96 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 96 | 4 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX1_T | 1 | | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("OWNER"='DINGJUN123' AND "OBJECT_ID">=107889 AND "OBJECT_ID" IS NOT NULL) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 1399 bytes sent via SQL*Net to client 416 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed |
上面的SQL走idx1_t,注意观察谓词,只有access,说明索引完全被利用上,很显然因为owner是前导列,而且是等值查询,按照前导列查询,然后只要分析索引的第2列object_id,当发现不满足条件object_id>=107889之后就停止了,索引扫描没有浪费。
dingjun123@ORADB> SELECT/*+index(t idx2_t)*/ * FROM t 2 WHERE owner='DINGJUN123' 3 AND object_id>=107889; 1 row selected. Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 3787301248 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 96 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 96 | 4 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX2_T | 1 | | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("OBJECT_ID">=107889 AND "OWNER"='DINGJUN123' AND "OBJECT_ID" IS NOT NULL) filter("OWNER"='DINGJUN123') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 1399 bytes sent via SQL*Net to client 416 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed |
强制使用idx2_t,object_id是前导列,谓词有access,还有filter,说明索引没有被完全利用上,这是因为object_id的不是等值查询,满足object_id>=107889的,按照顺序搜索
所以,中间可能有一些不满足owner='DINGJUN123'的,还要filter掉。
这种查询和不等值条件作为前导列的查询,一旦object_id>=107889不满足owner='DINGJUN123'的很多,那么必然造成过多不必要的索引搜索,COST与逻辑读会上升很快,
从而性能急剧下降, 因为本例子基本都满足owner条件,所以没有啥浪费。但是下面的例子:
dingjun123@ORADB> SELECT * FROM t 2 WHERE owner='DINGJUN123' 3 AND object_id>=100; 2540 rows selected. Elapsed: 00:00:00.15 Execution Plan ---------------------------------------------------------- Plan hash value: 2071967826 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2539 | 238K| 499 (0)| 00:00:06 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 2539 | 238K| 499 (0)| 00:00:06 | |* 2 | INDEX RANGE SCAN | IDX1_T | 2539 | | 12 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("OWNER"='DINGJUN123' AND "OBJECT_ID">=100 AND "OBJECT_ID" IS NOT NULL) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 527 consistent gets 21 physical reads 0 redo size 268134 bytes sent via SQL*Net to client 2275 bytes received via SQL*Net from client 171 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2540 rows processed |
还是使用idx1_t,没有问题。看下面的强制使用idx2_t的。
dingjun123@ORADB> SELECT/*+index(t idx2_t)*/ * FROM t 2 WHERE owner='DINGJUN123' 3 AND object_id>=100; 2540 rows selected. Elapsed: 00:00:00.33 Execution Plan ---------------------------------------------------------- Plan hash value: 3787301248 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2539 | 238K| 3762 (1)| 00:00:46 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 2539 | 238K| 3762 (1)| 00:00:46 | |* 2 | INDEX RANGE SCAN | IDX2_T | 2539 | | 3274 (1)| 00:00:40 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("OBJECT_ID">=100 AND "OWNER"='DINGJUN123' AND "OBJECT_ID" IS NOT NULL) filter("OWNER"='DINGJUN123') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 3763 consistent gets 0 physical reads 0 redo size 268134 bytes sent via SQL*Net to client 2275 bytes received via SQL*Net from client 171 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2540 rows processed dingjun123@ORADB> SELECT COUNT(*) FROM t WHERE object_id >= 100; COUNT(*) ---------- 1032649 |
SELECT COUNT(*) FROM t WHERE object_id >= 100;
返回1032649行,但是WHERE owner='DINGJUN123' AND object_id>=100 只返回2540行,要filter掉百万行,轮询索引,造成了极大的浪费。
3.如果都是比较,都是<,>之类的表达式
这种情况,前导列,根据谓词,选择条件能够定位最接近处理结果的基数,并能够减少索引后filter的工作,因为必然有一列是要走access之后的filter,最好是filter能够过滤较少数据,不要做过多过滤。
例如:
dingjun123@ORADB> SELECT * FROM t 2 WHERE owner>='DINGJUN123' 3 AND object_id>=107872; 37 rows selected. Elapsed: 00:00:00.00 Execution Plan ---------------------------------------------------------- Plan hash value: 3787301248 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 205 | 19680 | 43 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 205 | 19680 | 43 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX2_T | 205 | | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("OBJECT_ID">=107872 AND "OWNER">='DINGJUN123' AND "OBJECT_ID" IS NOT NULL) filter("OWNER">='DINGJUN123') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 14 consistent gets 0 physical reads 0 redo size 6468 bytes sent via SQL*Net to client 438 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 37 rows processed dingjun123@ORADB> SELECT COUNT(*) FROM t WHERE object_id>=107872; COUNT(*) ---------- 37 1 row selected. |
关闭index SKIP SCAN,因为owner种类很少,oracle选择skip SCAN
alter session set "_optimizer_skip_scan_enabled" = false;
dingjun123@ORADB> SELECT/*+index(t idx1_t)*/ * FROM t 2 WHERE owner>='DINGJUN123' 3 AND object_id>=107872; 37 rows selected. Elapsed: 00:00:00.23 Execution Plan ---------------------------------------------------------- Plan hash value: 2071967826 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 205 | 19680 | 3740 (1)| 00:00:45 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 205 | 19680 | 3740 (1)| 00:00:45 | |* 2 | INDEX RANGE SCAN | IDX1_T | 205 | | 3700 (1)| 00:00:45 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("OWNER">='DINGJUN123' AND "OBJECT_ID">=107872 AND "OWNER" IS NOT NULL) filter("OBJECT_ID">=107872) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 3688 consistent gets 0 physical reads 0 redo size 6468 bytes sent via SQL*Net to client 438 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 37 rows processed |
因为owner>='DINGJUN123'返回大量行,但是事实结果很少,只有几十行,过滤object_id>=107872,需要做大量工作,逻辑读和COST增大千倍+,性能低下。
后记:
当然如何选择前导列的顺序很复杂,得全盘考虑对应的谓词,SELECT的列等要素,还要考虑ORDER BY ,GROUP BY等列,比如3列组合索引,如何考虑顺序。
后续会补充更多的组合索引如何创建的要点。