1、B*树索引
1.1 概述
B*树索引是Oracle默认的索引结构。我们使用CREATE INDEX语句创建索引时,创建的就是B*树索引。B*树索引的结构一个二
叉树,由根节点(root node)、分支部分(branch node)和叶子节点(leaf node)构成。
提示:B*树的B是单词Balanced的缩写,平衡之意。
● 根节点:包含指向分支节点的信息。
● 分支节点:包含指向下一级分支节点或指向叶子节点的信息。
● 叶子节点:包含索引列的值和对应数据记录行的ROWID的信息。另外叶子节点还存有指向相邻叶子节点的信息,叶子节点之间
的指向是双向的,这样叶子节点就组成成了一个双向链接(如下图所示)。又因为索引的建立是经过排序的,所以叶子节点
之间是按排序链接的。叶子节点都具体相同的深度。
下面通过一个图来描述B*树索引的结构图:
B*树索引拥有一个树形的结构,Oracle索引检索数据,先从根节点开始,然后经过分支节点,找到对应叶子点上的字段值,因为
叶子节点存有对应数据记录行的ROWID,这样就能通过ROWID找到表中数据记录了。
1.2 B*树索引特点
下面列出B*树索引的一些特点:
① B树索引建立是忽略字段的NULL值的。
因为B*树索引中忽略了NULL值,所以如果SQL的where子句中有‘字段 IS NULL’或‘字段 IS NOT NULL’的条件时,即便该字
建立了索引,Oracle执行SQL时也不会走索引查询的。
比如创建一个t_student表,并在name字段上创建一个唯一索引,然后查看下面两个语句的执行计划:
(1) SELECT * FROM t_student WHERE name IS NULL;
和
(2) SELECT * FROM t_student WHERE name = 'CHEN';
发现语句(1)没有走索引,而语句(2)是走索引的。具体操作如下
创建表和索引:
create table T_STUDENT
(
gid NUMBER(38) not null,
name VARCHAR2(100) not null,
age NUMBER(3),
sex VARCHAR2(1),
grade NUMBER(4),
description VARCHAR2(1000)
)
create unique index STUDENT_NAME_INDEX on T_STUDENT (NAME)
tablespace TEST_DATA
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
SELECT * FROM t_student WHERE name IS NULL;
查看执行计划:
SQL> EXPLAIN PLAN
2 FOR SELECT * FROM t_student WHERE name IS NULL;
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2963365469
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 0 (0)| |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| T_STUDENT | 4 | 52 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NULL IS NOT NULL)
14 rows selected
SQL> EXPLAIN PLAN
2 FOR SELECT * FROM t_student WHERE name = 'CHEN';
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3397000786
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 1
| 1 | TABLE ACCESS BY INDEX ROWID| T_STUDENT | 1 | 13 | 1
|* 2 | INDEX UNIQUE SCAN | STUDENT_NAME_INDEX | 1 | | 0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("NAME"='CHEN')
14 rows selected
② 无论对于大数据表还是小数据表,B*树索引的效率几乎是相同的,因为B*树索引索引叶子节点具有相同深度。
③ B*树索引对于大表可以提供更加明显更好的性能,因为索引的深度一般不会超过4(1个头节点,2级分支节点和1级叶子
节点)。事实上,由于头块几乎总是被加载到内存中,分支节点也通常被加载到内存中,因此检索索引时需要进行的物
理磁盘访问通常只有1到2次。
④ B*索引支持范围查询(包括<、>、between、like模糊查询)和精度查询(=)。因为每个叶子节点可以链接到相邻的前
后两个叶子节点,而且叶子节点链是有序的。
⑤ 检索B*树索引节点,比对是从首字母开始比对的,如果查询时条件时前模糊的(比如like '%chen'),将不走索引。
如下语句是不走索引的:
SQL> EXPLAIN PLAN
2 FOR SELECT * FROM t_student WHERE name like '%CHEN';
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3237932320
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T_STUDENT | 1 | 13 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NAME" LIKE '%CHEN')
13 rows selected
⑥ Oracle会自动为主键和唯一键约束创建索引。
1.3 不走索引的情况
create table T_STUDENT
(
gid NUMBER(38) not null,
name VARCHAR2(100) not null,
age NUMBER(3),
sex VARCHAR2(1),
grade NUMBER(4),
description VARCHAR2(1000)
);
-- Create/Recreate indexes
create index STUDENT_AGE_INDEX on T_STUDENT (AGE)
tablespace TEST_DATA
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
create unique index STUDENT_NAME_INDEX on T_STUDENT (NAME)
tablespace TEST_DATA
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
-- Create/Recreate primary, unique and foreign key constraints
alter table T_STUDENT
add primary key (GID)
using index
tablespace TEST_DATA
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
SELECT * FROM t_student WHERE name = 111;
等同于
SELECT * FROM t_student WHERE to_number(name) = 111;
SQL> EXPLAIN PLAN
2 FOR SELECT * FROM t_student WHERE name = 'chen' OR age = 19;
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3956840407
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 |
| 1 | TABLE ACCESS BY INDEX ROWID | T_STUDENT | 1 | 13 |
| 2 | BITMAP CONVERSION TO ROWIDS | | | |
| 3 | BITMAP OR | | | |
| 4 | BITMAP CONVERSION FROM ROWIDS| | | |
|* 5 | INDEX RANGE SCAN | STUDENT_AGE_INDEX | | |
| 6 | BITMAP CONVERSION FROM ROWIDS| | | |
|* 7 | INDEX RANGE SCAN | STUDENT_NAME_INDEX | | |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("AGE"=19)
7 - access("NAME"='chen')
20 rows selected
SQL> EXPLAIN PLAN
2 FOR SELECT * FROM t_student WHERE name = 'chen' OR sex = '1';
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3237932320
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T_STUDENT | 1 | 13 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("SEX"='1' OR "NAME"='chen')
13 rows selected
从执行计划可以看到:语句:
SELECT * FROM t_student WHERE name = 'chen' OR age = 19;
OR关系中的列name和age都有被建了索引,所以语句执行时也是走索引的;而语句:
SELECT * FROM t_student WHERE name = 'chen' OR sex = '1';
OR关系中的列sex是没有建立索引的,速印语句执行时不走索引。
SQL> EXPLAIN PLAN
2 FOR SELECT * FROM t_student WHERE name IN ('CHEN', 'LIANG');
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1281816732
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 26 |
| 1 | INLIST ITERATOR | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T_STUDENT | 2 | 26 |
|* 3 | INDEX UNIQUE SCAN | STUDENT_NAME_INDEX | 2 | |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("NAME"='CHEN' OR "NAME"='LIANG')
15 rows selected
SQL> EXPLAIN PLAN
2 FOR SELECT * FROM t_student WHERE name NOT IN ('CHEN', 'LIANG');
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3237932320
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 26 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T_STUDENT | 2 | 26 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NAME"<>'LIANG' AND "NAME"<>'CHEN')
13 rows selected
(5) 被索引的列使用like时,比较的值最前端使用了通配符,如like '%_'
(6) 被索引的列使用IS NULL或IS NOT NULL(位图索引可以解决这个问题,因为位图索引保存NULL信息)
(7) 被索引的列使用不等于号!=或<>
(8) 被索引的列的值基数太低(即重复值太多)时
比如性别字段sex只有‘男’和‘女’两个值,即便在sex创建索引,也不会走索引的。
(9) 如果查询出的结果占到总数据量的比例很大(一般在15%以上,不是绝对值)
1.4 索引的选择性
1.5 组合索引
1.5.1 组合索引什么时候起作用
create index STUDENT_CONS_INDEX on T_STUDENT (NAME, SEX, AGE, GRADE)
tablespace TEST_DATA
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
create index STUDENT_NAME_INDEX on T_STUDENT (NAME)
tablespace TEST_DATA
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
SQL> explain plan
2 for select * from t_student where name='LILEI' and grade = 666; --(name, grade)是student_cons_index索引的前导列
Explained
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2397198704
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 1
| 1 | TABLE ACCESS BY INDEX ROWID| T_STUDENT | 1 | 13 | 1
|* 2 | INDEX RANGE SCAN | STUDENT_CONS_INDEX | 1 | | 1
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("NAME"='LILEI' AND "GRADE"=666)
filter("GRADE"=666)
15 rows selected
SQL> explain plan
2 for select * from t_student where sex='1' and grade = 666; --(sex, grade)不是student_cons_index索引的前导列
Explained
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3237932320
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T_STUDENT | 1 | 13 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("SEX"='1' AND "GRADE"=666)
13 rows selected
1.5.2 组合索引和单列索引选择性比较
select * from t_student where name = 'LILEI';
会选择哪个索引呢?因为组合索引的选择性比单列索引的选择性要好,所以会走student_cons_index索引,执行计划如下:
SQL> explain plan
2 for select * from t_student where name = 'LILEI';
Explained
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2397198704
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2
| 1 | TABLE ACCESS BY INDEX ROWID| T_STUDENT | 1 | 13 | 2
|* 2 | INDEX RANGE SCAN | STUDENT_CONS_INDEX | 1 | | 1
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("NAME"='LILEI')
14 rows selected
1.6 函数索引
CREATE OR REPLACE FUNCTION days_left(p_eff_date DATE)
RETURN NUMBER
IS
BEGIN
RETURN ((p_eff_date - sysdate);
END;
/
CREATE INDEX cust_i_eff_days ON
customers ( days_left(cust_eff_to));