[精华] 复合索引branch block上存储几个列的信息 [复制链接]

这是很早之前研究的问题,已经忘记了是什么案例导致去研究这个问题。

不过这个问题本省就很有意思,当在两个或者多个列上创建索引时,究竟branch block中会纪录几个列的信息?

首先我们知道branch block中存储结构大致是这样的 pointer - value - pointer - value - pointer.其中pointer指向leaf block或者下一层branch block.

那么这个value的值就是为了区分左右指针所指向的两个leaf block中(或者下一层branch block中)的值,这样从root -> branch -> leaf的时候才能准确定位到leaf block.
在提供答案前,首先给出几种不同的论点

1.不管建立在多少列上,branch block中只存储leading column信息

2. 如果建立在n个列上,branch block中存储n-1列的信息

3. branch block上存储所有列的信息.

这三种观点都有问题,首先观点一

如果这个索引的leading column的可选择性很差,例如status或者flag这样的列,distinct value很少

这时候如果索引建立在 (status, id)上,对于

select * from table where status=0 and id=12345这样的查询,range scan并不能准确定位到 leaf block

对于观点二,在n=2的时候显然也可以用上面的例子来推翻

对于观点三,是可以正确定位到leaf block,但是如果对于像

index(id,name) 其中name 假定name字段类型为varchar(1000),这样的索引效率也不好

事实上因为id列可选择性很好,一般不需要再存储name值到branch block中就可以区分左右的block

在branch block中存储的name列的值浪费了branch block的空间,

而我们知道单个branch block中能够存储值的多少直接影响了树的高度,而树的高度显然直接影响了index的效率

通过上面的分析,以上的三种观点都是错误的,那么oracle究竟如何来存储列信息才能既做到准确定位leaf block又不存储多余的值呢?

这里首先给出试验的结果,我没有找到官方文档或者其他的文章来解释这个结论,不过通过试验来看结论是很清楚的,而且也符合上面提到的两种考虑因素。

其实刚刚我们被限制住的原因是一直认为branch block中存储的列的个数是个常数,但事实上存储几个列是变化的。

这个值只需要区分其左右指针所指向的两个block的值就可以了,

更简单的说,当oracle从上到下走到这个branch block的时候,

通过这个值可以准确的知道向左走还是向右走,

举个例子,我们上面一开始提到的索引例子,

第一个是 index(status, id) , 如果左边block的最大值为(0,33),右边block的最小值为(0,35),

因为此时单纯从第一列不能区分两个block,这时就需要存储两列的信息

如果左边block最大值为(0,100),右边block最小值为(1,1)

因为此时从第一列就可以区分出两个block,所以只需要存储第一列的信息

这样对于

select * from table where status=0 and id=12345

查询,通过branch block中的值既可以知道“向左走,向右走”,准确定位leaf block

又可以避免不必要的浪费branch block空间,避免树高度不必要的增高

下面是试验的过程,直接copy以前自己发的邮件,懒得翻译了

We discussed the internal storage rule for composite index last time, but we didn’t get conclusion yet.

So I did a test for this.

SQL> create table test(n1 int,n2 int ,n3 int,v1 varchar2(2000),v2 varchar2(2000),v3 varchar2(2000));

Table created.

SQL> begin
2 for i in 1..1000 loop
3 insert into test values(0,i,i,rpad(’a',2000),rpad(’a',2000),rpad(’a',2000));
4 end loop;
5 end;
6 /

PL/SQL procedure successfully completed.

insert 1000 rows, the format is like this
0,1,1,xxxxxx,xxxxxxxxxx
0,2,2,xxxxxx,xxxxxxxxxx
0,3,3, xxxxxx,xxxxxxxxxx
……..
0,1000,1000, xxxxxx,xxxxxxxxxx

SQL> commit;

Commit complete.

create 7 indexes on this table, the naming rule is index_

_N (number) V (varchar)

SQL> create index index_2_nv on test(n1,v1);

Index created.

SQL> create index index_2_vn on test(v1,n1);

Index created.

SQL> create index index_3_nnv on test(n1,n2,v1);

Index created.

SQL> create index index_3_nvn on test(n1,v1,n2);

Index created.

SQL> create index index_4_nnnv on test(n1,n2,n3,v1);

Index created.

SQL> create index index_4_nnvn on test(n1,n2,v1,n3);

Index created.

SQL> create index index_5_nnnvv on test (n1,n2,n3,v1,v2);

Index created.

SQL> analyze table test compute statistics;

Table analyzed.

SQL> select index_name,PREFIX_LENGTH,BLEVEL,LEAF_BLOCKS from user_indexes
2 where lower(index_name) in (’index_2_nv’,'index_2_vn’,'index_3_nnv’,'index_3_nvn’,
3 ‘index_4_nnnv’,'index_4_nnvn’,'index_5_nnnvv’);

INDEX_NAME PREFIX_LENGTH BLEVEL LEAF_BLOCKS
—————————— ————- ———- ———–
INDEX_2_NV 5 334
INDEX_2_VN 5 334
INDEX_3_NNV 1 334
INDEX_3_NVN 5 334
INDEX_4_NNNV 1 334
INDEX_4_NNVN 1 334
INDEX_5_NNNVV 2 1000

7 rows selected.

You see, three indexes (INDEX_2_NV, INDEX_2_VN, INDEX_3_NVN) have level 5. This means these indexes’ branch block contains varchar2000.

Then I dump these indexes’ root block to confirm this.

SQL> select segment_name,file_id,block_id from dba_extents where
2 lower(segment_name) in (’index_2_nv’,'index_2_vn’,'index_3_nnv’,'index_3_nvn’,
‘index_4_nnnv’,'index_4_nnvn’,'index_5_nnnvv’) and extent_id=0 order by segment_name 3 ;

SEGMENT_NAME FILE_ID BLOCK_ID
—————————— ———- ———-
INDEX_2_NV 9 16082
INDEX_2_VN 9 16097
INDEX_3_NNV 9 13587
INDEX_3_NVN 1 22053
INDEX_4_NNNV 1 32702
INDEX_4_NNVN 1 62230
INDEX_5_NNNVV 1 61697

7 rows selected.
SQL> @temp.sql
SQL> alter system dump datafile 9 block 16083;
System altered.

SQL> alter system dump datafile 9 block 16098;
System altered.

SQL> alter system dump datafile 9 block 13588;
System altered.

SQL> alter system dump datafile 1 block 22054;
System altered.

SQL> alter system dump datafile 1 block 32703;
System altered.

SQL> alter system dump datafile 1 block 62231;
System altered.

SQL> alter system dump datafile 1 block 61698;
System altered.

.

It’s weird. The index INDEX_2_NV and INDEX_2_VN only have 2 columns. But in the root block dump file, we can find col0,col1 and col2(maybe something related to rowid, I guess).

I am not sure about what the third col is, but it’s clear that the number of columns in branch block is related about the data.

0,1,1,xxxxxx,xxxxxxxxxx
0,2,2,xxxxxx,xxxxxxxxxx
0,3,3, xxxxxx,xxxxxxxxxx
……..
0,1000,1000, xxxxxx,xxxxxxxxxx

The index INDEX_2_NV, INDEX_2_VN can’t distinguish the leaf blocks using the prefix two columns, so it added the third column (unknown, maybe something related to rowid, I guess).

The indexes INDEX_3_NNV, INDEX_4_NNNV, INDEX_4_NNVN and INDEX_5_NNNVV can distinguish the leaf blocks using the prefix two columns, so the branch blocks only contain 2 columns.

The index INDEX_3_NVN can should use three columns to distinguish the left/right leaf blocks, so it contains 3 columns (because the prefix two columns’ values are the same).

You can check the dump file xfan_ora_25936.trc for details.

So I guess that the number of columns stored in branch block is related to linked leaf blocks’ data.
The columns which stored in branch blocks need to distinguish the left/right linked leaf blocks.
If the rule for composite index is like I guessed, then the columns stored in branch blocks maybe different in one index branch block!!

So I did another test to prove this:

SQL> create table test2(n1 int,n2 int,n3 int,v1 varchar2(2000),v2 varchar2(2000));

Table created.

SQL> create index test_idx on test2(n1,n2,n3,v1,v2);

Index created.

SQL> begin
2 for i in 1..1000 loop
insert into test2 values(0,0,i,rpad(’a',2000),rpad(’a',2000));
end loop;
end;
/ 3 4 5 6

PL/SQL procedure successfully completed.

Insert first 1000 rows like this:

0, 0, 1, xxxxx,xxxxxx
0, 0, 2, xxxxx,xxxxxx
0, 0, 3, xxxxx,xxxxxx
……
0, 0, 1000, xxxxx,xxxxxx

SQL> SQL>
SQL> begin
2 for i in 1001..2000 loop
insert into test2 values(0,i,i,rpad(’a',2000),rpad(’a',2000));
end loop;
end;
/ 3 4 5 6

Then insert other 1000 rows like this:
0,1,1,xxxx,xxxx
0,2,2,xxxx,xxxx
0,3,3,xxxx,xxxx
…….
0,1000,1000,xxxx,xxxx

Then I did treedump and branch block dump.

Here is a snippet of the branch block dump:

row#454[2145] dba: 4218395=0×405e1b
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 3; (3): c2 0a 64
col 3; TERM
row#455[2133] dba: 4218396=0×405e1c
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c2 0b
col 3; TERM
row#456[2124] dba: 4218397=0×405e1d
col 0; len 1; (1): 80
col 1; len 1; (1): c2
col 2; TERM
row#457[2113] dba: 4218398=0×405e1e
col 0; len 1; (1): 80
col 1; len 3; (3): c2 0b 03
col 2; TERM
row#458[2102] dba: 4218399=0×405e1f
col 0; len 1; (1): 80
col 1; len 3; (3): c2 0b 04
col 2; TERM

You see, the numbers of columns changed in one branch block!!

The attachment index.trc is the dump file

So the conclusion is that the number of columns stored in branch block is related to linked leaf blocks’ data.

The branch block will store as less columns as possible to distinguish the left/right leaf blocks it linked
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值