2013-08-07 星期三 下午

-----------------位图索引--------------------------------------


当设计一个存放海量数据的数据库时,需要注意的是,几乎很少用主键,表都是事实表,

以存放事实数据为目的,在这些大量的数据中,数据冗余是不可避免的,考虑位图索引。


位图索引(bitmap index)——从oracle7开始就已经存在了,用来解决B数索引在查询的时候遇到的一些性能问题,

                       重复的数据太多导致查询性能低下。

        在每一个位图索引的索引键值中,都会保存指向多行的指针,这样位图索引的键值就会比B数索引少很多。


创建位图索引的语法:


SQL> create table emp as select * from scott.emp;


Table created.


SQL> create bitmap index emp_bit_ind on emp(job);


Index created.


select * from user_indexes where index_name=upper('emp_bit_ind')  --位图索引


查询位图索引的结构:

analyze index emp_bit_ind validate structure;

select * from index_stats


位图索引的结构


行值        1    2    3   4   5    6    7     8    9   10  11    12   13    14

--------------------------------------------

CLERK       1    0    0   0   0    0    0     0    0    0   1    1     0     1

SALESMAN    0    1    1   0   1    0    0     0    0    1   0    0     0     0

PRESIDENT   0    0    0   0   0    0    0     0    1    0   0    0     0     0

MANAGER     0    0    0   1   0    1    1     0    0    0   0    0     0     0

ANALYST     0    0    0   0   0    0    0     1    0    0   0    0     1     0

---------------------------------------------

创建的位图索引创建了5个键值,但是对应14行数据

创建B数索引的话,要14个索引键值,14个叶子行


从上面的结果可以看出,位图索引比B树索引节省空间,位图索引非产适合于作逻辑运算操作以及聚合运算的COUNT操作


创建一个比较大的表

SQL> create table t7 as select * from all_objects;


Table created.


SQL> insert into t7 select * from t7;


40935 rows created.


SQL> commit;


Commit complete.


SQL> select count(1) from t7;


 COUNT(1)

----------

    81870


SQL> insert into t7 select * from t7;


81870 rows created.


SQL> commit;


Commit complete.


SQL> select count(1) from t7;


 COUNT(1)

----------

   163740


SQL> create bitmap index t7_bit_ind on t7(object_type); --T7上创建位图索引


Index created.


SQL> create table t8 as select * from t7;


Table created.


SQL> create index t8_bit_ind on t8(object_type);  --T8上创建B树索引


Index created.


SQL> exec dbms_stats.gather_table_stats(user,'t7',cascade=>true);


PL/SQL procedure successfully completed.


SQL> exec dbms_stats.gather_table_stats(user,'t8',cascade=>true);


PL/SQL procedure successfully completed.


查看两个索引占用的存储空间:

SQL> set linesize 10000

SQL> select segment_name,bytes from user_segments where segment_type='INDEX' and (segment_name like '%T7%' or segment_name like '%T8%');


SEGMENT_NAME                                                                           BYTES

--------------------------------------------------------------------------------- ----------

T7_BIT_IND                                                                            131072

T8_BIT_IND                                                                           4194304


显然位图索引比B树索引占用的空间少很多。


SQL> set autotrace trace exp

SQL> select count(1) from t7 where object_type='TABLE';


Execution Plan

----------------------------------------------------------

Plan hash value: 277922936


------------------------------------------------------------------------------------------

| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |            |     1 |    10 |     1   (0)| 00:00:01 |

|   1 |  SORT AGGREGATE             |            |     1 |    10 |            |          |

|   2 |   BITMAP CONVERSION COUNT   |            |  9115 | 91150 |     1   (0)| 00:00:01 |

|*  3 |    BITMAP INDEX SINGLE VALUE| T7_BIT_IND |       |       |            |          |

------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


  3 - access("OBJECT_TYPE"='TABLE')


Statistics

----------------------------------------------------------

         0  recursive calls

         0  db block gets

         2  consistent gets

         0  physical reads

         0  redo size

       412  bytes sent via SQL*Net to client

       400  bytes received via SQL*Net from client

         2  SQL*Net roundtrips to/from client

         0  sorts (memory)

         0  sorts (disk)

         1  rows processed


SQL> select count(1) from t8 where object_type='TABLE';


Execution Plan

----------------------------------------------------------

Plan hash value: 368075426


--------------------------------------------------------------------------------

| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |            |     1 |    10 |    28   (0)| 00:00:01 |

|   1 |  SORT AGGREGATE   |            |     1 |    10 |            |          |

|*  2 |   INDEX RANGE SCAN| T8_BIT_IND |  9082 | 90820 |    28   (0)| 00:00:01 |

--------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


  2 - access("OBJECT_TYPE"='TABLE')


Statistics

----------------------------------------------------------

         0  recursive calls

         0  db block gets

         4  consistent gets

         1  physical reads

         0  redo size

       412  bytes sent via SQL*Net to client

       400  bytes received via SQL*Net from client

         2  SQL*Net roundtrips to/from client

         0  sorts (memory)

         0  sorts (disk)

         1  rows processed


 在创建有位图的索引的表上以创建索引的字段作为where条件,作count运算,从执行计划来看,

oracle选择访问位图索引代价很低。


SQL> select count(1) from t7 where object_type='TABLE' or object_type='INDEX';



Execution Plan

----------------------------------------------------------

Plan hash value: 285551327


-------------------------------------------------------------------------------------------

| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

-------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT             |            |     1 |    10 |     2   (0)| 00:00:01 |

|   1 |  SORT AGGREGATE              |            |     1 |    10 |            |          |

|   2 |   INLIST ITERATOR            |            |       |       |            |          |

|   3 |    BITMAP CONVERSION COUNT   |            | 18230 |   178K|     2   (0)| 00:00:01 |

|*  4 |     BITMAP INDEX SINGLE VALUE| T7_BIT_IND |       |       |            |          |

-------------------------------------------------------------------------------------------


SORT AGGREGATE——排序后聚合操作

BITMAP CONVERSION COUNT——在位图索引上作count运算

BITMAP INDEX SINGLE VALUE——扫表位图索引的单值


Predicate Information (identified by operation id):

---------------------------------------------------


  4 - access("OBJECT_TYPE"='INDEX' OR "OBJECT_TYPE"='TABLE')



Statistics

----------------------------------------------------------

         1  recursive calls

         0  db block gets

         4  consistent gets

         0  physical reads

         0  redo size

       412  bytes sent via SQL*Net to client

       400  bytes received via SQL*Net from client

         2  SQL*Net roundtrips to/from client

         0  sorts (memory)

         0  sorts (disk)

         1  rows processed


SQL> select count(1) from t8 where object_type='TABLE' or object_type='INDEX';



Execution Plan

----------------------------------------------------------

Plan hash value: 2716926154


---------------------------------------------------------------------------------

| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |            |     1 |    10 |    53   (0)| 00:00:01 |

|   1 |  SORT AGGREGATE    |            |     1 |    10 |            |          |

|   2 |   INLIST ITERATOR  |            |       |       |            |          |

|*  3 |    INDEX RANGE SCAN| T8_BIT_IND | 18164 |   177K|    53   (0)| 00:00:01 |

---------------------------------------------------------------------------------


INLIST ITERATOR——列表过滤器


相当于把这个SQL做了改写:


select count(1) from t8 where object_type in('TABLE','INDEX');


Predicate Information (identified by operation id):

---------------------------------------------------


  3 - access("OBJECT_TYPE"='INDEX' OR "OBJECT_TYPE"='TABLE')



Statistics

----------------------------------------------------------

         1  recursive calls

         0  db block gets

         7  consistent gets

         0  physical reads

         0  redo size

       412  bytes sent via SQL*Net to client

       400  bytes received via SQL*Net from client

         2  SQL*Net roundtrips to/from client

         0  sorts (memory)

         0  sorts (disk)

         1  rows processed


行值        1    2    3   4   5    6    7     8    9   10   11   12    13    14

-----------------------------------------------

TABLE       1    0    0   0   0    0    0     0    0    0   1    1     0     1

INDEX       0    1    1   0   1    0    0     0    0    1   0    0     0     0

-----------------------------------------------

OR运算      1    1    1   0   1    0    0     0    0    1   1    1     0     1


原理:由于位图索引的每个键值保存了一组对应的记录,当对不同的键值作and或者or操作的时候,

     只需要对这些信息在各行的分布进行计算(实际就是0和1的位运算),得出结果。


位图索引对于重复数据进行and或者or操作,比B树索引性能更优,这是由位图索引的结构来决定的。


案例:

SQL> select count(1) from emp where job='CLERK' or JOB='MANAGER';



Execution Plan

----------------------------------------------------------

Plan hash value: 3902482824


---------------------------------------------------------------------------------------------

| Id  | Operation                     | Name        | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT              |             |     1 |     6 |     1   (0)| 00:00:01 |

|   1 |  SORT AGGREGATE               |             |     1 |     6 |            |          |

|   2 |   BITMAP CONVERSION COUNT     |             |     7 |    42 |     1   (0)| 00:00:01 |

|*  3 |    BITMAP INDEX FAST FULL SCAN| EMP_BIT_IND |       |       |            |          |

---------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


  3 - filter("JOB"='CLERK' OR "JOB"='MANAGER')


Note

-----

  - dynamic sampling used for this statement



Statistics

----------------------------------------------------------

         5  recursive calls

         0  db block gets

         8  consistent gets

         0  physical reads

         0  redo size

       411  bytes sent via SQL*Net to client

       400  bytes received via SQL*Net from client

         2  SQL*Net roundtrips to/from client

         0  sorts (memory)

         0  sorts (disk)

         1  rows processed

行值        1    2    3   4   5    6    7     8    9   10   11   12   13    14

-----------------------------------------------

CLERK       1    0    0   0   0    0    0     0    0    0   1    1     0     1

MANAGER     0    0    0   1   0    1    1     0    0    0   0    0     0     0

-------------------------------------------------

OR          1    0    0   1   0    1    1     0    0    0   1    1     0     1


SQL> conn hr/hr

Connected.

SQL> select count(1) from emp where job='CLERK' or JOB='MANAGER';


 COUNT(1)

----------

        7



行值        1    2    3   4   5    6    7     8    9    10   11  12   13     14

---------------------------------------------

CLERK       1    0    0   0   0    0    0     0    0    0   1    1     0     1

MANAGER     0    0    0   1   0    1    1     0    0    0   0    0     0     0

---------------------------------------------

AND         0    0    0   0   0    0    0     0    0    0   0    0     0     0


SQL> select count(1) from emp where job='CLERK' and JOB='MANAGER';--and操作是不符合逻辑的


 COUNT(1)

----------

        0


SQL> select count(1) from emp where job='CLERK' and JOB='MANAGER';



Execution Plan

----------------------------------------------------------

Plan hash value: 3353207192


----------------------------------------------------------------------------------------------

| Id  | Operation                      | Name        | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT               |             |     1 |     6 |     0   (0)|          |

|   1 |  SORT AGGREGATE                |             |     1 |     6 |            |          |

|*  2 |   FILTER                       |             |       |       |            |          |

|   3 |    BITMAP CONVERSION COUNT     |             |     4 |    24 |     1   (0)| 00:00:01 |

|*  4 |     BITMAP INDEX FAST FULL SCAN| EMP_BIT_IND |       |       |            |          |

----------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


  2 - filter(NULL IS NOT NULL)

  4 - filter("JOB"='CLERK')


****************************************

oracle在底层对SQL做了改写:


select count(1) from emp where NULL IS NOT NULL and job='CLERK'


 SQL语句在ORACLE底层都要被修改的,修改目标是不改变查询结果的逻辑,而让SQL语句性能更优。



Note

-----

  - dynamic sampling used for this statement


按照改写后的看PLAN。结果是一样的。


SQL> select count(1) from emp where NULL IS NOT NULL and job='CLERK';



Execution Plan

----------------------------------------------------------

Plan hash value: 3353207192


----------------------------------------------------------------------------------------------

| Id  | Operation                      | Name        | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT               |             |     1 |     6 |     0   (0)|          |

|   1 |  SORT AGGREGATE                |             |     1 |     6 |            |          |

|*  2 |   FILTER                       |             |       |       |            |          |

|   3 |    BITMAP CONVERSION COUNT     |             |     4 |    24 |     1   (0)| 00:00:01 |

|*  4 |     BITMAP INDEX FAST FULL SCAN| EMP_BIT_IND |       |       |            |          |

----------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


  2 - filter(NULL IS NOT NULL)

  4 - filter("JOB"='CLERK')


Note

-----

  - dynamic sampling used for this statement



Statistics

----------------------------------------------------------

         7  recursive calls

         0  db block gets

         6  consistent gets

         0  physical reads

         0  redo size

       410  bytes sent via SQL*Net to client

       400  bytes received via SQL*Net from client

         2  SQL*Net roundtrips to/from client

         0  sorts (memory)

         0  sorts (disk)

         1  rows processed



------------------------------------------------

作查询操作,但是不做聚合运算。


SQL> exec dbms_stats.gather_table_stats(user,'t7',estimate_percent=>100,cascade=>true);


PL/SQL procedure successfully completed.


SQL> exec dbms_stats.gather_table_stats(user,'t8',estimate_percent=>100,cascade=>true);


PL/SQL procedure successfully completed.


SQL> select * from t7 where object_type='TABLE';


528 rows selected.



Execution Plan

----------------------------------------------------------

Plan hash value: 1637879808


-------------------------------------------------------------------------------------------

| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

-------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT             |            |   528 | 50160 |   114   (1)| 00:00:02 |

|   1 |  TABLE ACCESS BY INDEX ROWID | T7         |   528 | 50160 |   114   (1)| 00:00:02 |

|   2 |   BITMAP CONVERSION TO ROWIDS|            |       |       |            |          |

|*  3 |    BITMAP INDEX SINGLE VALUE | T7_BIT_IND |       |       |            |          |

-------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


  3 - access("OBJECT_TYPE"='TABLE')



Statistics

----------------------------------------------------------

         1  recursive calls

         0  db block gets

       138  consistent gets

         0  physical reads

         0  redo size

     30906  bytes sent via SQL*Net to client

       785  bytes received via SQL*Net from client

        37  SQL*Net roundtrips to/from client

         0  sorts (memory)

         0  sorts (disk)

       528  rows processed


SQL> select * from t8 where object_type='TABLE';


528 rows selected.



Execution Plan

----------------------------------------------------------

Plan hash value: 1051944969


------------------------------------------------------------------------------------------

| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |            |   528 | 50160 |    23   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T8         |   528 | 50160 |    23   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | T8_BIT_IND |   528 |       |     4   (0)| 00:00:01 |

------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


  2 - access("OBJECT_TYPE"='TABLE')



Statistics

----------------------------------------------------------

         1  recursive calls

         0  db block gets

       179  consistent gets

         0  physical reads

         0  redo size

     30906  bytes sent via SQL*Net to client

       785  bytes received via SQL*Net from client

        37  SQL*Net roundtrips to/from client

         0  sorts (memory)

         0  sorts (disk)

       528  rows processed


SQL> select /*+ index(t8 t8_bit_ind)*/ * from t8 where object_type='TABLE';


528 rows selected.



Execution Plan

----------------------------------------------------------

Plan hash value: 1051944969


------------------------------------------------------------------------------------------

| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |            |   528 | 50160 |    23   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T8         |   528 | 50160 |    23   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | T8_BIT_IND |   528 |       |     4   (0)| 00:00:01 |

------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


  2 - access("OBJECT_TYPE"='TABLE')



Statistics

----------------------------------------------------------

         1  recursive calls

         0  db block gets

       179  consistent gets

         0  physical reads

         0  redo size

     30906  bytes sent via SQL*Net to client

       785  bytes received via SQL*Net from client

        37  SQL*Net roundtrips to/from client

         0  sorts (memory)

         0  sorts (disk)

       528  rows processed


 如果不做聚合运算,还是B树索引的性能比较高。


结论:位图索引在有聚合操作的时候且键值重复率高的情况下,才能体现其优势,

     主要用在OLAP系统上,OLTP是不用的。