笔记系列------索引

最新推荐文章于 2022-07-09 15:56:54 发布

柠檬

最新推荐文章于 2022-07-09 15:56:54 发布

阅读量159

点赞数

分类专栏： oracle 文章标签： oracle sql

本文链接：https://blog.csdn.net/weixin_43475880/article/details/120828233

版权

oracle 专栏收录该内容

99 篇文章 1 订阅

订阅专栏

解决表太大，查询效率低得方法：

index
分区partition （逻辑一致，物理块分着存）
物化视图 materialized view (分布式读写分离)
并行查询（多cpu）

1. 索引结构与特点

1.1 B树索引结构

B树索引结构（图），介绍根节点，分支节点，叶子节点，以及表行，rowid，键值，双向链等概念

注：

叶块之间使用双向链连接
删除表行时索引叶块也会更新，但只是逻辑更改，并不做物理的删除叶块
索引叶块中不保存表行键值的null信息

1.2 位图索引结构

位图索引适用于离散度较低（说明数据较为集中）的列，它的叶块中存放key，start rowid-end rowid,并应用一个函数把位图中相应key值置1，位图索引在逻辑or时效率最高

set autotrace traceonly explain;

create bitmap index job_bitmap on emp1(job);

select count(*) from emp1 where job='CLERK' or job='MANAGER';

select count(*) from emp1 where job='CLERK' and job='MANAGER';

set autotrace off;

以上操作使用autotrace可以看到优化器使用了bitmap

更多内容请参考：

https://blog.csdn.net/weixin_43475880/article/details/121946036

1.3 两种结构对比

B树索引比位图索引应用更广泛，下面重点关注B树索引

索引与表相关的可选结构，在逻辑上和物理上都独立于表的数据，索引能优化查询，不能优化DML操作，oracle自动维护索引，频繁的DML操作反而会引起大量的索引维护

如果sql语句仅访问被索引的列，那么数据库只需从索引中读取数据，而不用读取表，如果该语句同时还要访问除索引之外的列，则数据库会使用rowid来查找表中的行，通常为检索表数据，数据库以交替方式先读取索引块，后读取相应表块。

2.B数索引结构和位图索引的使用环境

B树适合情况	位图适合情况
大表，返回行数<5%	大表，返回行数<5%
经常使用where字句查询的列	经常使用where字句查询的列
离散度较高的列	离散度较低的列
更新键值代价低	更新键值代价高
逻辑and效率高	逻辑or效率高
用于OLTP	用于OLAP

3.索引的类型与选项

3.1 B树索引

1)唯一或非唯一索引 unique or non_unique:唯一索引指键值不重复

create unique index empno_idx on emp1(empno);

or

create index empno_idx on emp1(empno)

2)索引组合（composite）：绑定了两个或更多列的索引。

create index job_deptno_idx on emp1(job,deptno);

使用时若是where子句中同时限制job和deptno或是仅限制首选项job，range scan可以被应用

若是仅限制deptno，则使用全表扫描full scan

实验：

set autotrace on;
select * from emp1 where job='CLERK' and deptno=30;  
//若是使用or连接，则使用table access full,而非range scan
select * from emp1 where job='CLERK';
select * from emp1 where deptno=30;

3)反向键索引（reverse）：将字节倒置后组织键值。当使用序列产生主键索引，可以防止叶节点出现热块现象,均匀分布在索引块上

create index mgr_idx on emp1(mgr) reverse;

检查索引创建：

select * from user_index;
select * from user_idx_columns;

set autotrace on;

select * from emp1 where mgr=7698;

4)函数索引（function base）：以索引列值得函数值为键值去组织索引

create index fun_idx on emp1(lower(ename));

select * from emp1 where lower(ename)='scott';

SQL> select * from emp1 where lower(ename)='scott';

Execution Plan
----------------------------------------------------------
Plan hash value: 2906051717

---------------------------------------------------------------------------------------

| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Tim
e     |

---------------------------------------------------------------------------------------


|   0 | SELECT STATEMENT            |         |     1 |    94 |     2   (0)| 00:00:01 |


|   1 |  TABLE ACCESS BY INDEX ROWID| EMP1    |     1 |    94 |     2   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | FUN_IDX |     1 |       |     1   (0)| 00:00:01 |

---------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(LOWER("ENAME")='scott')

Note
-----
   - dynamic sampling used for this statement (level=2)

5)压缩（compress）：重复键值只存储一次，即重复的键值在叶块中就存一次，后跟所有与之匹配的rowid字符串

create index comp_idx on emp1(sal) compress;

6)升序或降序：叶节点中的键值排列默认是升序的。

create index deptno_job_inx on emp(deptno desc,job asc);

7)可以更改索引属性：

alter index xxx...

索引相关数据字典：

user_indexes //索引主要信息

user_ind_columns //索引列信息

3.2 位图索引

4. 优化器使用索引的扫描方式

oracle执行计划常见四种索引扫描方式

1）索引唯一扫描（index unique scan）

通过唯一索引查找一个数值返回单个rowid。对于唯一组合索引，要在where的谓词‘=’后包含所有列的“布尔与”。

2）索引范围扫描（index range scan）

在非唯一索引上，可能返回多行数据，故非唯一索引上均使用索引范围扫描

使用index range scan的3种情况：

（a）在唯一索引上使用range操作符（> < <> >= <= between）
（b）在唯一组合索引列上，对组合索引使用部分列进行查询（含引导列），导致查询出多行
（c）对非唯一索引列上继续宁任何查询不含“布尔或”

SQL> select * from emp1 where empno>=7788;  //range scan

     EMPNO ENAME      JOB              MGR HIREDATE                  SAL
---------- ---------- --------- ---------- ------------------ ----------
      COMM     DEPTNO
---------- ----------
      7788 SCOTT      ANALYST         7566 19-APR-87                3000
                   20

      7839 KING       PRESIDENT            17-NOV-81                5000
                   10

      7844 TURNER     SALESMAN        7698 08-SEP-81                1500
         0         30


     EMPNO ENAME      JOB              MGR HIREDATE                  SAL
---------- ---------- --------- ---------- ------------------ ----------
      COMM     DEPTNO
---------- ----------
      7876 ADAMS      CLERK           7788 23-MAY-87                1100
                   20

      7900 JAMES      CLERK           7698 03-DEC-81                 950
                   30

      7902 FORD       ANALYST         7566 03-DEC-81                3000
                   20


     EMPNO ENAME      JOB              MGR HIREDATE                  SAL
---------- ---------- --------- ---------- ------------------ ----------
      COMM     DEPTNO
---------- ----------
      7934 MILLER     CLERK           7782 23-JAN-82                1300
                   10


7 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1726997417

-------------------------------------------------------------------------------                 ----------

| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)|                  Time     |

-------------------------------------------------------------------------------                 ----------

|   0 | SELECT STATEMENT            |           |     7 |   609 |     2   (0)|                  00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| EMP1      |     7 |   609 |     2   (0)|                 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | EMPNO_IDX |     7 |       |     1   (0)|                  00:00:01 |

-------------------------------------------------------------------------------                ----------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("EMPNO">=7788)

Note
-----
   - dynamic sampling used for this statement (level=2)



SQL> select * from emp1 where empno=7788;//inique scan

     EMPNO ENAME      JOB              MGR HIREDATE                  SAL      COMM     DEPTNO
---------- ---------- --------- ---------- ------------------ ---------- ---------- ----------
 20

      7788 SCOTT      ANALYST         7566 19-APR-87                3000
                  


Execution Plan
----------------------------------------------------------
Plan hash value: 4033643343

-----------------------------------------------------------------------------------------

| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| T
ime     |

----------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |           |     1 |    87 |     1   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| EMP1      |     1 |    87 |     1   (0)| 00:00:01 |

|*  2 |   INDEX UNIQUE SCAN         | EMPNO_IDX |     1 |       |     0   (0)| 00:00:01 |

-----------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("EMPNO"=7788)

SQL>

3）索引全扫描（index full scan）

对整个index进行扫描，并且顺序的读取其中数据。

全oracle索引扫描只在CBO模式下才有效。CBO根据统计数值的值进行全oracle索引扫描比进行全表扫描更有效时，才进行全oracle索引扫描，而且此时查询出的数据都必须从索引中可直接得到

4）索引快速扫描（index fast full scan）

扫描索引中所有的数据块，与index full scan类似，但是显著区别是full scan 是根据叶子块的双向列表顺序读取，读取的块是有顺序的，也是经过排序的，故返回列表也是排序的。而fast full scan在读取叶子块时完全由物理存储决定，并采取多块读，每次读取db_file_multiblock_read_count个块。

//分析器是根据要访问的数据量和索引的聚簇因子等属性判断使用range scan或者index full scan

聚簇因子（clustering_factor）:堆表的表行物理的存储在数据块是无序的，这与插入一行记录搜选空闲块的策略有关，而索引的键值又是有序，当这两者差异越大，聚簇因子的值就越高。

实验：

1)若是scott不能使用autotrace：

conn / as sysdba
@ORACLE_HOME/rdbms/admin/utlxplan
@ORACLE_HOME/sqlplus/admin/plustrce
grant plustrace to scott,hr;

2)索引唯一扫描（index unique scan）

create unique index emp1_idx on emp1(empno);

select empno from emp1 where empno=7788;

drop index emp1_idx;

3）索引范围扫描（index range scan）

create index emp1_idx on emp1(empno);

select empno from emp1 where empno=7788;

4)索引全扫描（index full scan）

当你要查询出的数据全部可以从索引中直接得到，也就是说仅索引块而不需要读表块，此时选择index （fast）full scan

alter table emp1 modify (empno not null); //因为需要扫描叶子节点，所以索引叶子块不存在空值，使empno字段非空。

select empno from emp1； ----数据库进访问索引本身数据，无需访问表

SQL> create index empno_idx on emp1(empno);

Index created.

SQL> select empno from emp1;

     EMPNO
----------
      7369
      7499
      7521
      7566
      7654
      7698
      7782
      7788
      7839
      7844
      7876

     EMPNO
----------
      7900
      7902
      7934

14 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2226897347

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    14 |   182 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| EMP1 |    14 |   182 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

SQL> alter table emp1 modify empno not null'
  2  ^C

SQL> alter table emp1 modify empno not null;

Table altered.

SQL> select empno from emp1;

     EMPNO
----------
      7369
      7499
      7521
      7566
      7654
      7698
      7782
      7788
      7839
      7844
      7876

     EMPNO
----------
      7900
      7902
      7934

14 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3564745122

------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |    14 |   182 |     1   (0)| 00:00:01 |
|   1 |  INDEX FULL SCAN | EMPNO_IDX |    14 |   182 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

从上面的实验中可以看出当，索引中是非空的，而emp1表是存在空值的，所以第一次就不是使用的index，而当modify去空值后，此时就可以使用full scan；若是emp1中存在空值而使用索引时，索引块不存空值无法返回空值，而select要求必须要返回所有值（包括空值），所以当emp1表不为空时是无法使用full scan模式的

5）索引快速扫描（index fast full scan）

insert into emp1 select * from emp1;

....

select count(empno) from emp1;

对比index full scan，当行数较多时，会使用index fast full scan

行数较多时index fast full scan比index full scan计划成本要低得多，所以CBO优化器会优选index fast full scan

select empno from emp1;

SQL> select empno from emp1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2417938496

----------------------------------------------------------------------------------

| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time |

----------------------------------------------------------------------------------

|   0 | SELECT STATEMENT     |           | 27366 |   347K|    30   (0)| 00:00:01|

|   1 |  INDEX FAST FULL SCAN| EMPNO_IDX | 27366 |   347K|    30   (0)| 00:00:01|

----------------------------------------------------------------------------------


Note
-----
   - dynamic sampling used for this statement (level=2)

可再加一行hint，强制oracle使用index full scan的执行计划，得到cpu cost是100

select /*+ index(emp1 empno_idx)*/ empno from emp1;

SQL> select /*+ index(emp1 empno_idx)*/ empno from emp1;

Execution Plan
----------------------------------------------------------
Plan hash value: 3564745122

------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           | 27366 |   347K|   103   (0)| 00:00:02 |
|   1 |  INDEX FULL SCAN | EMPNO_IDX | 27366 |   347K|   103   (0)| 00:00:02 |
------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

根据上面的实验，看出fast full scan的cpu cost是30%，而full scan的cpu cost是103%

CBO优化器何时决定index full scan与index fast full scan：

共性：当仅从索引表中就可得到所要的查询结果，省去了第二部扫描块表。

个性：index fast full scan 可以使用多块读，多块读由参数db_file_multiblock_read_count指定，适用于表行多时IO效率更高，而对于索引上的order by之类的操作又几乎总是使用index full scan（因为fast full scan是直接读物理块，是无序的；而full scan是依靠叶子之间的双向链，是有序的；order by需要排序，而fast full scan不满足排序的要求，故而一般都是使用full scan）

5.索引的碎片问题

由于对基表做DML操作，便导致对索引块的自动更改操作，尤其是基表的delete操作会引起index表的index entries的逻辑删除。只有当一个索引块中的全部index entry都被删除了，这个块才会被收回。如果update基表索引列，则索引块会发生entry delete，再entry insert，这些动作都可能产生索引碎片

create table t (id int);

create index ind_1 on t(id);

begin 
  for i in 1..1000000 loop
    insert into t values (i);
      if mod(i,100)=0 then
      commit;
      end if;
  end loop;
end;
/

analyze index ind_1 validate structure;
select name,height,pct_used,del_lf_rows/lf_rows from index_stats;
NAME                               HEIGHT   PCT_USED DEL_LF_ROWS/LF_ROWS
------------------------------ ---------- ---------- -------------------
IND_1                                   3        100                   0

//作为是否需要去做碎片整理的依据，height是索引的高度（根结点，一个分支节点，
叶子节点，则高度为3）;如果height结果为4，则需要整理碎片

delete t where rownum<700000;
alter index ind_1 rebuild [online] [tablespace name]

oralce文档中并未清晰给出索引碎片的量化标准，oracle建议通过segment advisor（段顾问）解决表和索引的碎片问题，说明积累的碎片应该整理了。

1.height >= 4
2.pct_used < 50%
3.del_lf_rows/lf_rows>0.2

联机重建索引通常比删除后再重建要更实用，oracle 9i和10g一直提供联机索引重建功能--rebuild online，但是由于涉及到使用表的排他锁，通常一段时间内其他用户的DML操作需要等待

在11g中，重新设计了锁模式，因此容许DML操作不受干扰。

alter index ind_1 coalesce; //索引融合，比rebuild 动作轻(rebuild是重建索引表)，可以合并一些块中的index entries；

6.索引不可见（invisiable）

可在创建索引时指定invisible属性或者用alter 语句来修改索引为invisible（visible）

create table test(id int,name char(10));
create index test_idx on test(id);
alter index test_idx invisible;

select index_name,status,visibility from user_indexes;
INDEX_NAME                     STATUS   VISIBILIT
------------------------------ -------- ---------
TEST_IDX                       VALID    INVISIBLE
IND_1                          VALID    VISIBLE

索引被设定为invisible后，视图user_indexes 中status字段仍是valid，实际上就是指该索引对于优化器不可见，但索引的正常更新仍会由oracle自动完成。

也可以索引还在，但是优化器不可见，而且不会正常更新

alter index test_idx unusable;
select index_name,status VISIBILITY from user_indexes;

索引被设定为unusable后，视图user_indexes 中status字段是unusable

查询索引的2个动态试图：

select * from dba_indexes;

select * from dba_ind_columns;

柠檬

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
笔记系列------索引

1. 索引结构与特点1.1 B树索引结构1.2 位图索引结构2.B数索引结构和位图索引的使用环境3.索引的类型与选项3.1 B树索引3.2 位图索引4. 优化器使用索引的扫描方式5.索引的碎片问题6.索引不可见（invisiable）...
复制链接

扫一扫

专栏目录