ORA FAQ 性能调整系列之——当索引第一列由序列产生,一个逆序索引有什么用?

原创 2004年10月15日 17:17:00

ORA FAQ 性能调整系列之——
The Oracle (tm) Users' Co-Operative FAQ
Why would a reverse index be useful when the leading column of the index is generated from a sequence ?

Author's name: Norman Dunbar
Author's Email: Oracle@bountifulsolutions.co.uk
 Date written: 18/04/2004
Oracle version(s):

When you store data in an indexed table, certain columns of data are copied into the index alongside the rowid of the data row in the table. The data in the table is stored 'randomly', or at least, not necessarily in the order you put them there.

The index entries, on the other hand, must be stored in order, otherwise the usability of the index is removed. If you could walk through the entries in an index, you would see that they are in order, usually ascending, but since 8i, descending also works.

Entries are stored in order of their internal representation, not necessarily the same as what you see on screen when you SELECT the columns from a table.

If the indexed column(s) contain character data (CHAR, NCHAR, VARCHAR2 or NVARCHR2) then the data will appear on screen exactly as it does in the index. For example, if the column contains 'ORACLE' the index entry will also be 'ORACLE'.
若索引列含有字符数据(CHAR, NCHAR, VARCHAR2或NVARCHR2),那么数据在屏幕上的显示将和在索引中一致。例如,若列中含有“ORACLE”,那么索引项中也是“ORACLE”。

We can use the DUMP command to show us the internal representation of any data type. This command takes four parameters. The first is the data you wish to dump, the second is the base you wish to dump it in. The default is 10 which means that DUMP will display the characters in decimal, or base 10. The other allowed values are 8 (Octal), 16 (Hexadecimal) or 17 (Characters).

The third parameter is the start position in the data you wish to dump from and the final parameter is the amount of data you wish to dump. All but the first parameter have sensible defaults.

Using DUMP, we can see the individual character codes for our 'ORACLE' data :

SQL> select dump('ORACLE',10) from dual;

Typ=96 Len=6: 79,82,65,67,76,69

We can prove that this is correct by converting back from decimal character codes to actual characters :

SQL> select chr(79),chr(82),chr(65),chr(67),chr(76),chr(69) from dual;

- - - - - -

We could have used base 17 to do the same thing :

SQL> select dump('ORACLE',17) from dual;

Typ=96 Len=6: O,R,A,C,L,E

Numeric columns are very much different. The internal format of a number is different from that which appears on screen after a SELECT because the internal format is converted to ASCII format so that it can be displayed. We can see this in the following, first in character format :

SQL> select '1234' as "1234",
  2  dump('1234', 17)
  3  from dual;

1234 DUMP('1234',17)
---- ---------------------
1234 Typ=96 Len=4: 1,2,3,4

Then in internal format :

SQL> select 1234 as "a number",
  2  dump(1234, 17)
  3  from   dual;

a number   DUMP(1234,17)
---------- --------------------
   1234 Typ=2 Len=3: c2,^M,#

The first columns in both examples look identical, but this is only because SQLPlus has converted the internal format of the number 1,234 into the character format so that the display device (the monitor screen) is able to show it. Binary characters have a nasty tendency to disrupt character devices like computer monitors when running in text mode.

Take a look at the second column in the above examples and notice the difference. In the first example we see the individual characters '1', '2', '3' and '4' while the second example shows only three bytes in the internal format of the number 1,234. Lets change the DUMP calls slightly, and do the whole lot in one command :
看看上面例子的第二列并注意不同点。第一个例子中我们看到一个一个字符'1', '2', '3'和'4',而第二个例子只显示数1,234内部格式中的三个字节。让我们稍微修改一下DUMP的调用,用一个指令完成全部比较:

SQL> select '1234' as "1234",
  2  dump('1234', 10),
  3  1234 as "a number",
  4  dump(1234, 10)
  5  from dual;

1234 DUMP('1234',10)     a number   DUMP(1234,10)
---- ------------------------- ---------- ----------------------
1234 Typ=96 Len=4: 49,50,51,52   1234 Typ=2 Len=3: 194,13,35

This time, we see the actual character codes used internally. Once again columns 2 and 4 differ. Column 4 is showing three bytes and these three bytes are the internal binary representation of the number 1,234.

It is this binary representation that is used in the index entry when a number column is indexed.

Take a few minutes and experiment with dumping a few other numbers - stick to integers for now as those are what sequences generate.

SQL> create table test (a number);

Table created.

SQL> begin
  2    for x in 1 .. 1e6
  3    loop
  4     insert into test values (x, substr(dump(x,10), 14));
  5    end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.

If we have a look at the 'b' column of the table, we can see that each entry is ascending in a similar manner to the 'a' column. Here are the first 20 rows :

SQL> col b format a20 wrap
SQL> select a,b from test where a < 21;

   A B
---------- ----------
   1 193,2
   2 193,3
   3 193,4
   4 193,5
   5 193,6
   6 193,7
   7 193,8
   8 193,9
   9 193,10
  10 193,11
  11 193,12
  12 193,13
  13 193,14
  14 193,15
  15 193,16
  16 193,17
  17 193,18
  18 193,19
  19 193,20
  20 193,21

The entries are very similar and all have the same leading byte.

How sequences affect indexes.
As mentioned above, index entries have to be stored in order, however, the table data need not be. If your indexed column is fed by a sequence, the data will be similar to the 20 rows shown above.

Similar entries will group together in the index, so the index blocks will split as necessary and new entries will end up all hitting the same block until it too fills up and splits.

If you have one person running the application, this isn't too much of a problem. If the application is multi-user then it means that every user will tend to write into the same index block and buffer busy waits will be the outcome as transactions 'queue' to write data to the hottest index block around.

Back in our small test, if you select more data from the test table, you will find that in the 1 million rows, there are only 4 different values for the leading byte on the internal numeric format and even worse, most of the entries in the index have the same leading byte value :

SQL> select substr(b,1,3),count(*)
  2  from test
  3  group by substr(b,1,3);

--- ----------
193   99
194    9900
195  990000
196    1

I cheated and discovered that there was a comma in position 4 of every row in the table that's how I knew to use a three character length in my SUBSTR.

What the above shows is that in an index of 1 million sequential entries, the vast majority have the same leading byte and so will all be trying to get into the same block in the index.

How reverse indexes cure the problem.
A reverse key index stores the bytes of the indexed column(s) in reverse order, so the data 'ORACLE' is actually stored in the index as 'ELCARO'. Using a reverse index on a column fed by a sequence spreads the location of sequential numbers across a wider range of leaf blocks and the problem of a single hot block is removed because the index entries are stored in reverse order.

SQL> alter table test add (c varchar2(30));

Table altered.

SQL> update test set c = substr(dump(reverse(a),10),14);

1000000 rows updated.

SQL> select substr(c,1,instr(c,',')-1),count(*)
  2 from test
  3 group by substr(c,1,instr(c,',')-1)
  4 order by to_number(substr(c,1,instr(c,',')-1))

--- ----------
2   10102
3   10101
4   10101
5   10101

All other numbers between 6 and 95 inclusive, have 10,101 entries each.

96   10101
97   10101
98   10101
99   10101
100   10101

99 rows selected.

This time, our 1 million row index entry has it's leading byte value spread across 99 (100 if you include a value for zero) different values, rather than just 4. In addition, the actual reversed bytes are fairly randomly scattered across each of the different values too.

As more entries are added to the index, blocks will be split to accomodate the new entries in their proper location. As the data is arriving almost 'randomly' by means of the reversing of the actual data bytes for the index, the index itself will be extended to accomodate these new values. However, rather than always being stored in the same single 'hot' index block, new entries will be spread across a number of existing blocks (assuming the index has been around for a while) thus reducing contention. Of course, block splits will still occur on these blocks as new values fill up the existing block but it's happening all over the index not just in one place.

This is the reason why reversing the index when its leading column is fed by a sequence reduces buffer contention, removes the hot block problem and by doing so, reduces the potential for buffer busy waits on a multi-user system.

Drawbacks to Reverse Key Indexes
Of course, there are drawbacks as well. By setting up a reverse key index you are increasing the clustering factor of the index. The clustering factor (from USER_INDEXES) is used by the optimiser (CBO) to determine how best to access data in an INDEX RANGE SCAN. If the clustering factor is roughly equal to BLOCKS minus FREE_BLOCKS from USER_TABLES then the chances are that a range scan will read one index block, and locate all (or nearly all) of the data rows in needs in one or more adjacent blocks in the table.
当然,也有不足之处。设置一个逆序索引,增加了索引的簇因子(Clustering Factor)。优化器(CBO)利用簇因子(来自USER_INDEXES)来决定在一次INDEX RANGE SCAN中如何最好的访问数据。若簇因子几乎与BLOCKS - FREE_BLOCKS(来自USER_TABLES)相近,一个区间扫描将可能读一个索引块,然后根据需要在一个或更多的临近块中定位所有(或接近所有)数据行。

On the other hand, if the clustering factor is close to NUM_ROWS in USER_TABLES then the chances are that the entries stored together in one index block are likely to be scattered throughout a wide range of table blocks - so the index range scan may not be chosen as a good method of access.

Obviously the above applies to an analysed table and index.


In a quick test on a table with 100,000 rows loaded using a sequence, a normal index was used for most queries returning up to 30 rows, as was the reverse key index, however, when the number of rows went up to 1,000,000 the reverse key index was never used and a full table scan was used every time.

Further reading:
Oracle reference manual for your version of Oracle.

本文翻译自http://www.jlcomp.demon.co.uk/faq/autotrace_pruning.html  译者仅保留翻译版权

使用倒序索引提升ORDER BY DESC性能

使用倒序索引(INDEX DESC),可以大幅提升带有order by desc子句的SQL语句性能。  一、场景 1、表名:test_t,有一字段名为object_id 2、总数据量:5800...
  • csmnjk
  • csmnjk
  • 2016年10月18日 17:45
  • 1436


本文只针对mysql进行相关方面的说明。 首先我要问,“数据索引是有序无序的?”。答:“当然是有序的了。” 再问,一个SQL语句可以有几个索引?你先别往下看结果,你这想下,可以使用几个索引...
  • qq_36976949
  • qq_36976949
  • 2016年12月16日 20:25
  • 468


毫无疑问,给表添加索引是有好处的,你要做的大部分工作就是维护索引,在数据更改期间索引可能产生碎片,所以一些维护是必要的。碎片可能是你查询产生性能问题的来源。 a8din}0ITPUB个人空间...
  • dingxingmei
  • dingxingmei
  • 2013年10月30日 15:14
  • 1102


1.单列索引满足需求时不用复合索引; 2.create index idx on table(col1,col2,col3) ; 列顺序:区分度大的,字段长的放前面; 3.当where 条件里用到了c...
  • zhq200902
  • zhq200902
  • 2017年02月06日 17:06
  • 723


一、索引是什么 索引是与表或视图关联的磁盘上结构,可以加快从表或视图中检索行的速度。索引包含由表或视图中的一列或多列生成的键。这些键存储在一个结构(B树)中,使 SQL Server 可以快速有...
  • w_xuexi666
  • w_xuexi666
  • 2016年12月27日 16:56
  • 3550


一、索引 MongoDB 提供了多样性的索引支持索引信息被保存在 system.indexes 中且默认总是为 _id 创建索引。 1、基础索引在字段 age 上创建索引 1 (升序); -1 (...
  • MitKey
  • MitKey
  • 2016年02月18日 17:44
  • 2236


三目运算来计算总页数 totalpage=sum/pagesize+sum%pagesize==0?0:1;//计算总页数,sum为总记录数 2.第page页的记录的起始位置和结束位置分别为: p...
  • u011018979
  • u011018979
  • 2014年05月11日 23:46
  • 1148

ORA FAQ 性能调整系列之——压缩索引会提高性能么?

Will compressing my indexes improve performance ?压缩索引会提高性能么?Authors name: Jonathan LewisAuthors Emai...
  • ern
  • ern
  • 2004年10月20日 11:03
  • 3455

索引的升序 降序

  • zhangxiaobo5152
  • zhangxiaobo5152
  • 2016年01月12日 16:55
  • 239


序列 序列是一种用来生成唯一数字值的数据库对象,是一种高效率获得唯一键值的途径。 序列的值由Oracle程序按递增或递减顺序自动生成,通常用来自动生成表的主键值。 序列是独立的数据库对象,并不依...
  • mafan121
  • mafan121
  • 2014年09月21日 18:13
  • 588
您举报文章:ORA FAQ 性能调整系列之——当索引第一列由序列产生,一个逆序索引有什么用?