数据库版本:oracle11g 11.2.0.1.0
概要:
oracle优化器在统计信息和直方图准确的情况下,对组合索引并不算智能,意味着很可能因为组合索引原因导致执行计划不准
测试脚本准备
1.创建一张普通表和组合索引
create table test
as
select decode(mod(level,4), 1, 'A',2,'B',3,'C','D') type,
decode(mod(level,8), 1, 'VALID',2, 'INVALID','x') status,
level seq,
dbms_random.string('X', 30) str
from dual
connect by level <= 100000;
create index idx_test_n1 on test(type,status);
2.查看当前索引统计信息
SQL> select t.LAST_ANALYZED,t.STALE_STATS,t.* from dba_ind_statistics t where t.INDEX_NAME='IDX_TEST_N1';
LAST_ANALYZED STALE_STATS OWNER INDEX_NAME TABLE_OWNER TABLE_NAME PARTITION_NAME PARTITION_POSITION SUBPARTITION_NAME SUBPARTITION_POSITION OBJECT_TYPE BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR NUM_ROWS AVG_CACHED_BLOCKS AVG_CACHE_HIT_RATIO SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS USER_STATS STATTYPE_LOCKED STALE_STATS
------------- ----------- ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------ ------------------------------ --------------------- ------------ ---------- ----------- ------------- ----------------------- ----------------------- ----------------- ---------- ----------------- ------------------- ----------- ------------- ------------ ---------- --------------- -----------
2021/9/23 0:1 SCOTT IDX_TEST_N1 SCOTT TEST INDEX 1 227 6 37 640 3840 100000 100000 2021/9/23 0:1 NO NO
3.查看测试sql的数量
SQL> select COUNT(1)
2 from test t
3 where t.type = 'A'
4 and t.status = 'VALID';
COUNT(1)
----------
12500
4.删除索引的统计信息
我们在创建索引时系统会自动收集索引统计信息,为了验证原理需要我们需要将其删除。
--删除索引统计信息
begin
dbms_stats.delete_index_stats('SCOTT','IDX_TEST_N1',force => true);
end;
/
--统计信息已删除
SQL> select t.LAST_ANALYZED,t.STALE_STATS,t.* from dba_ind_statistics t where t.INDEX_NAME='IDX_TEST_N1';
LAST_ANALYZED STALE_STATS OWNER INDEX_NAME TABLE_OWNER TABLE_NAME PARTITION_NAME PARTITION_POSITION SUBPARTITION_NAME SUBPARTITION_POSITION OBJECT_TYPE BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR NUM_ROWS AVG_CACHED_BLOCKS AVG_CACHE_HIT_RATIO SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS USER_STATS STATTYPE_LOCKED STALE_STATS
------------- ----------- ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------ ------------------------------ --------------------- ------------ ---------- ----------- ------------- ----------------------- ----------------------- ----------------- ---------- ----------------- ------------------- ----------- ------------- ------------ ---------- --------------- -----------
SCOTT IDX_TEST_N1 SCOTT TEST INDEX NO NO
SQL>
未收集统计信息时——采用动态收集生成执行计划
1.获取执行计划
explain plan for
select T.*, COUNT(1) OVER()
from test t
where t.type = 'A'
and t.status = 'VALID';
select * from table(dbms_xplan.display());
2.查看执行计划
Plan hash value: 3285274135
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 13043 | 25M| 130 (0)| 00:00:02 |
| 1 | WINDOW BUFFER | | 13043 | 25M| 130 (0)| 00:00:02 |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST | 13043 | 25M| 130 (0)| 00:00:02 |
|* 3 | INDEX RANGE SCAN | IDX_TEST_N1 | 13043 | | 30 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T"."TYPE"='A' AND "T"."STATUS"='VALID')
Note
-----
- dynamic sampling used for this statement (level=2)
结论:从执行计划提示信息来看,是采用动态统计生成的(dynamic sampling used for this statement (level=2)),执行计划预估13043与我们真实12500行数基本一致,且执行计划错误。
收集统计信息时
分为两种情况:
1.只收集表的统计信息——根据组合列distinct值作乘积
①收集表的统计信息
begin
dbms_stats.gather_table_stats('SCOTT',
'TEST',
estimate_percent => 100,
method_opt => 'FOR ALL COLUMNS SIZE 1',
cascade => false);
end;
/
②查看表的统计信息
SQL> select t.HISTOGRAM,
2 T.table_name,
3 T.column_name,
4 T.num_distinct,
5 T.sample_size,
6 T.last_analyzed,
7 T.density
8 from dba_tab_col_statistics t
9 where t.table_name = 'TEST';
HISTOGRAM TABLE_NAME COLUMN_NAME NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED DENSITY
--------------- ------------------------------ ------------------------------ ------------ ----------- ------------- ----------
NONE TEST STR 100000 100000 2021/9/23 0:3 1E-5
NONE TEST SEQ 100000 100000 2021/9/23 0:3 1E-5
NONE TEST STATUS 3 100000 2021/9/23 0:3 0.33333333
NONE TEST TYPE 4 100000 2021/9/23 0:3 0.25
③其执行计划
Plan hash value: 3285274135
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8333 | 333K| 9 (0)| 00:00:01 |
| 1 | WINDOW BUFFER | | 8333 | 333K| 9 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST | 8333 | 333K| 9 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IDX_TEST_N1 | 1000 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T"."TYPE"='A' AND "T"."STATUS"='VALID')
分析:执行计划走的是索引扫描,且预估行源差异挺大,
表的预估行源为:8333=100000/4/3
结论:在只有表的统计信息情况下,索引预估不准,且执行计划错误。
补充:这种情况优化器预估逻辑:rows=表的总行数/(组合列的distinct值乘积)
2.同时收集表与索引统计信息时
①收集二者统计信息
begin
dbms_stats.gather_table_stats('SCOTT',
'TEST',
estimate_percent => 100,
method_opt => 'FOR ALL COLUMNS SIZE 1',
cascade => true);
end;
/
②查看索引统计信息
SQL> select t.LAST_ANALYZED,
2 t.STALE_STATS,
3 t.INDEX_NAME,
4 t.DISTINCT_KEYS,
5 t.NUM_ROWS
6 from dba_ind_statistics t
7 where t.INDEX_NAME = 'IDX_TEST_N1';
LAST_ANALYZED STALE_STATS INDEX_NAME DISTINCT_KEYS NUM_ROWS
------------- ----------- ------------------------------ ------------- ----------
2021/9/23 0:5 NO IDX_TEST_N1 6 100000
③其执行计划
Plan hash value: 2494466765
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16667 | 667K| 183 (2)| 00:00:03 |
| 1 | WINDOW BUFFER | | 16667 | 667K| 183 (2)| 00:00:03 |
|* 2 | TABLE ACCESS FULL| TEST | 16667 | 667K| 183 (2)| 00:00:03 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T"."TYPE"='A' AND "T"."STATUS"='VALID')
分析:执行计划采用的是索引统计信息,差异不大,
此时行源:16667=100000/6
结论:在有索引统计信息时,优化器采用索引的统计信息,此时执行计划正确
补充:这种情况优化器预估逻辑:rows=索引的总行数/(索引的distinct值)
收集直方图时
1.收集直方图
begin
dbms_stats.gather_table_stats('SCOTT',
'TEST',
estimate_percent => 100,
method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY',
cascade => true);
end;
/
2.查看直方图信息
SQL> select t.HISTOGRAM,
2 T.table_name,
3 T.column_name,
4 T.num_distinct,
5 T.sample_size,
6 T.last_analyzed,
7 T.density
8 from dba_tab_col_statistics t
9 where t.table_name = 'TEST';
HISTOGRAM TABLE_NAME COLUMN_NAME NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED DENSITY
--------------- ------------------------------ ------------------------------ ------------ ----------- ------------- ----------
HEIGHT BALANCED TEST STR 100000 100000 2021/9/23 1:0 1E-5
NONE TEST SEQ 100000 100000 2021/9/23 1:0 1E-5
FREQUENCY TEST STATUS 3 100000 2021/9/23 1:0 5E-6
FREQUENCY TEST TYPE 4 100000 2021/9/23 1:0 5E-6
3.其执行计划
Plan hash value: 3285274135
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3125 | 125K| 128 (0)| 00:00:02 |
| 1 | WINDOW BUFFER | | 3125 | 125K| 128 (0)| 00:00:02 |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST | 3125 | 125K| 128 (0)| 00:00:02 |
|* 3 | INDEX RANGE SCAN | IDX_TEST_N1 | 3125 | | 8 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T"."TYPE"='A' AND "T"."STATUS"='VALID')
分析:直方图一般运用在数据倾斜情况,本实验案例确实存在数据倾斜,但执行计划依然不正确。预估行源:3125=1000000.250.125
结论:此时执行计划是错的,即使收集了直方图信息,针对组合索引情况,优化器表现并不智能。
补充:直方图优化器预估逻辑:rows=表的行数*各个组合列的选择率
结语:组合索引一定要手动去算行源,再确定执行计划
附上最后两种执行计划性能比较,由于例子选的不够典型,差异不是很明显,但结论和分析是没有问题,组合索引优化主题和原理是准确的
走索引扫描的性能
全表扫描的性能
本人擅长oracle性能优化,包括不限于sql、pl/sql、awr、数据库,如需解决相关问题可以私信或留言