oracle组合索引原理

数据库版本:oracle11g 11.2.0.1.0

概要:

oracle优化器在统计信息和直方图准确的情况下,对组合索引并不算智能,意味着很可能因为组合索引原因导致执行计划不准

测试脚本准备
1.创建一张普通表和组合索引
create table test 
as 
select decode(mod(level,4), 1, 'A',2,'B',3,'C','D') type,
       decode(mod(level,8), 1, 'VALID',2, 'INVALID','x') status,
       level seq,
       dbms_random.string('X', 30) str
  from dual
connect by level <= 100000;

create index idx_test_n1 on test(type,status);
2.查看当前索引统计信息
SQL> select t.LAST_ANALYZED,t.STALE_STATS,t.* from dba_ind_statistics t where t.INDEX_NAME='IDX_TEST_N1';

LAST_ANALYZED STALE_STATS OWNER                          INDEX_NAME                     TABLE_OWNER                    TABLE_NAME                     PARTITION_NAME                 PARTITION_POSITION SUBPARTITION_NAME              SUBPARTITION_POSITION OBJECT_TYPE      BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR   NUM_ROWS AVG_CACHED_BLOCKS AVG_CACHE_HIT_RATIO SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS USER_STATS STATTYPE_LOCKED STALE_STATS
------------- ----------- ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------ ------------------------------ --------------------- ------------ ---------- ----------- ------------- ----------------------- ----------------------- ----------------- ---------- ----------------- ------------------- ----------- ------------- ------------ ---------- --------------- -----------
2021/9/23 0:1             SCOTT                          IDX_TEST_N1                    SCOTT                          TEST                                                                                                                                  INDEX                 1         227             6                      37                     640              3840     100000                                            100000 2021/9/23 0:1 NO           NO                         

3.查看测试sql的数量
SQL> select COUNT(1)
  2    from test t
  3   where t.type = 'A'
  4     and t.status = 'VALID';

  COUNT(1)
----------
     12500
4.删除索引的统计信息

我们在创建索引时系统会自动收集索引统计信息,为了验证原理需要我们需要将其删除。

--删除索引统计信息
begin
  dbms_stats.delete_index_stats('SCOTT','IDX_TEST_N1',force => true);
  end;
/
--统计信息已删除
SQL> select t.LAST_ANALYZED,t.STALE_STATS,t.* from dba_ind_statistics t where t.INDEX_NAME='IDX_TEST_N1';

LAST_ANALYZED STALE_STATS OWNER                          INDEX_NAME                     TABLE_OWNER                    TABLE_NAME                     PARTITION_NAME                 PARTITION_POSITION SUBPARTITION_NAME              SUBPARTITION_POSITION OBJECT_TYPE      BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR   NUM_ROWS AVG_CACHED_BLOCKS AVG_CACHE_HIT_RATIO SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS USER_STATS STATTYPE_LOCKED STALE_STATS
------------- ----------- ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------ ------------------------------ --------------------- ------------ ---------- ----------- ------------- ----------------------- ----------------------- ----------------- ---------- ----------------- ------------------- ----------- ------------- ------------ ---------- --------------- -----------
                          SCOTT                          IDX_TEST_N1                    SCOTT                          TEST                                                                                                                                  INDEX                                                                                                                                                                                          NO           NO                         

SQL> 
未收集统计信息时——采用动态收集生成执行计划
1.获取执行计划
explain plan for 
select T.*, COUNT(1) OVER()
  from test t
 where t.type = 'A'
   and t.status = 'VALID';
   
select * from table(dbms_xplan.display());
2.查看执行计划
Plan hash value: 3285274135
 
--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             | 13043 |    25M|   130   (0)| 00:00:02 |
|   1 |  WINDOW BUFFER               |             | 13043 |    25M|   130   (0)| 00:00:02 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST        | 13043 |    25M|   130   (0)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | IDX_TEST_N1 | 13043 |       |    30   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("T"."TYPE"='A' AND "T"."STATUS"='VALID')
 
Note
-----
   - dynamic sampling used for this statement (level=2)

结论:从执行计划提示信息来看,是采用动态统计生成的(dynamic sampling used for this statement (level=2)),执行计划预估13043与我们真实12500行数基本一致,且执行计划错误。
收集统计信息时

分为两种情况:

1.只收集表的统计信息——根据组合列distinct值作乘积
①收集表的统计信息
begin
  dbms_stats.gather_table_stats('SCOTT',
                                'TEST',
                                estimate_percent => 100,
                                method_opt       => 'FOR ALL COLUMNS SIZE 1',
                                cascade => false);
end;
/
②查看表的统计信息
SQL> select t.HISTOGRAM,
  2         T.table_name,
  3         T.column_name,
  4         T.num_distinct,
  5         T.sample_size,
  6         T.last_analyzed,
  7         T.density
  8    from dba_tab_col_statistics t
  9   where t.table_name = 'TEST';

HISTOGRAM       TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED    DENSITY
--------------- ------------------------------ ------------------------------ ------------ ----------- ------------- ----------
NONE            TEST                           STR                                  100000      100000 2021/9/23 0:3       1E-5
NONE            TEST                           SEQ                                  100000      100000 2021/9/23 0:3       1E-5
NONE            TEST                           STATUS                                    3      100000 2021/9/23 0:3 0.33333333
NONE            TEST                           TYPE                                      4      100000 2021/9/23 0:3       0.25

③其执行计划
Plan hash value: 3285274135
 
--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |  8333 |   333K|     9   (0)| 00:00:01 |
|   1 |  WINDOW BUFFER               |             |  8333 |   333K|     9   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST        |  8333 |   333K|     9   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IDX_TEST_N1 |  1000 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("T"."TYPE"='A' AND "T"."STATUS"='VALID')

分析:执行计划走的是索引扫描,且预估行源差异挺大,
表的预估行源为:8333=100000/4/3

结论:在只有表的统计信息情况下,索引预估不准,且执行计划错误。
补充:这种情况优化器预估逻辑:rows=表的总行数/(组合列的distinct值乘积)
2.同时收集表与索引统计信息时
①收集二者统计信息
begin
  dbms_stats.gather_table_stats('SCOTT',
                                'TEST',
                                estimate_percent => 100,
                                method_opt       => 'FOR ALL COLUMNS SIZE 1',
                                cascade => true);
end;
/
②查看索引统计信息
SQL> select t.LAST_ANALYZED,
  2         t.STALE_STATS,
  3         t.INDEX_NAME,
  4         t.DISTINCT_KEYS,
  5         t.NUM_ROWS
  6    from dba_ind_statistics t
  7   where t.INDEX_NAME = 'IDX_TEST_N1';

LAST_ANALYZED STALE_STATS INDEX_NAME                     DISTINCT_KEYS   NUM_ROWS
------------- ----------- ------------------------------ ------------- ----------
2021/9/23 0:5 NO          IDX_TEST_N1                                6     100000

③其执行计划
Plan hash value: 2494466765
 
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 16667 |   667K|   183   (2)| 00:00:03 |
|   1 |  WINDOW BUFFER     |      | 16667 |   667K|   183   (2)| 00:00:03 |
|*  2 |   TABLE ACCESS FULL| TEST | 16667 |   667K|   183   (2)| 00:00:03 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("T"."TYPE"='A' AND "T"."STATUS"='VALID')

分析:执行计划采用的是索引统计信息,差异不大,
此时行源:16667=100000/6

结论:在有索引统计信息时,优化器采用索引的统计信息,此时执行计划正确
补充:这种情况优化器预估逻辑:rows=索引的总行数/(索引的distinct值)
收集直方图时
1.收集直方图
begin
  dbms_stats.gather_table_stats('SCOTT',
                                'TEST',
                                estimate_percent => 100,
                                method_opt       => 'FOR ALL COLUMNS SIZE SKEWONLY',
                                cascade          => true);
end;
/
2.查看直方图信息
SQL> select t.HISTOGRAM,
  2         T.table_name,
  3         T.column_name,
  4         T.num_distinct,
  5         T.sample_size,
  6         T.last_analyzed,
  7         T.density
  8    from dba_tab_col_statistics t
  9   where t.table_name = 'TEST';

HISTOGRAM       TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED    DENSITY
--------------- ------------------------------ ------------------------------ ------------ ----------- ------------- ----------
HEIGHT BALANCED TEST                           STR                                  100000      100000 2021/9/23 1:0       1E-5
NONE            TEST                           SEQ                                  100000      100000 2021/9/23 1:0       1E-5
FREQUENCY       TEST                           STATUS                                    3      100000 2021/9/23 1:0       5E-6
FREQUENCY       TEST                           TYPE                                      4      100000 2021/9/23 1:0       5E-6

3.其执行计划
Plan hash value: 3285274135
 
--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |  3125 |   125K|   128   (0)| 00:00:02 |
|   1 |  WINDOW BUFFER               |             |  3125 |   125K|   128   (0)| 00:00:02 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST        |  3125 |   125K|   128   (0)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | IDX_TEST_N1 |  3125 |       |     8   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("T"."TYPE"='A' AND "T"."STATUS"='VALID')

分析:直方图一般运用在数据倾斜情况,本实验案例确实存在数据倾斜,但执行计划依然不正确。预估行源:3125=1000000.250.125

结论:此时执行计划是错的,即使收集了直方图信息,针对组合索引情况,优化器表现并不智能。
补充:直方图优化器预估逻辑:rows=表的行数*各个组合列的选择率
结语:组合索引一定要手动去算行源,再确定执行计划
附上最后两种执行计划性能比较,由于例子选的不够典型,差异不是很明显,但结论和分析是没有问题,组合索引优化主题和原理是准确的
走索引扫描的性能

在这里插入图片描述

全表扫描的性能

在这里插入图片描述

本人擅长oracle性能优化,包括不限于sql、pl/sql、awr、数据库,如需解决相关问题可以私信或留言
  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值