列基数与列选择性、集簇因子、直方图

<pre name="code" class="sql">一、基数:
一个列中唯一键(Distinct_keys)的个数,如有一个100W行的表,性别列的基数为2 (select distinct xx from t),主键列的基数为100W(select distinct id from t);

二、选择性:
基数/总行数所占的百分比,性别 2/100w * 100% 主键 100%  选择性越高 越有利于使用索引20~30%就算是比较高了
 

select a.column_name,  
       b.num_rows,  
       a.num_distinct Cardinality,  
       round(a.num_distinct/b.num_rows * 100, 2) selectivity 
  from dba_tab_col_statistics a, dba_tables b  
 where a.owner = b.owner  
   and a.table_name = b.table_name  
   and a.owner = upper('&owner')  
   and a.table_name = upper('&table_name')  
如何查看列的选择性和基数呢?
实验:(linux——11g)
<pre name="code" class="sql">create table t as select * from dba_objects;
create index ind_t_id on t(object_id);
create index ind_t_name on t(object_name);
create index ind_t_owner on t(object_owner);
列的基数
SCOTT@fyl>select count(distinct object_id),count(distinct owner) from t;

COUNT(DISTINCTOBJECT_ID) COUNT(DISTINCTOWNER)
------------------------ --------------------
                   65494                   19
列的统计信息
SCOTT@fyl>select TABLE_NAME,COLUMN_NAME,NUM_DISTINCT,trunc(DENSITY,3),NUM_NULLS,NUM_BUCKETS,HISTOGRAM from user_tab_col_statistics where table_name='T';
TABLE_NAME COLUMN_NAME                    NUM_DISTINCT TRUNC(DENSITY,3)  NUM_NULLS NUM_BUCKETS HISTOGRAM
---------- ------------------------------ ------------ ---------------- ---------- ----------- ---------------
T          OWNER                                    19             .052          0           1 NONE
T          OBJECT_NAME                           38000                0          0           1 NONE
T          SUBOBJECT_NAME                           69             .014      65255           1 NONE
T          OBJECT_ID                             65494                0          0           1 NONE
T          DATA_OBJECT_ID                         5051                0      60404           1 NONE
T          OBJECT_TYPE                              42             .023          0           1 NONE
T          CREATED                                1111                0          0           1 NONE
T          LAST_DDL_TIME                          1142                0          0           1 NONE
T          TIMESTAMP                              1218                0          0           1 NONE
T          STATUS                                    2               .5          0           1 NONE
T          TEMPORARY                                 2               .5          0           1 NONE
T          GENERATED                                 2               .5          0           1 NONE
T          SECONDARY                                 2               .5          0           1 NONE
T          NAMESPACE                                19             .052          0           1 NONE
T          EDITION_NAME                              0                0      65494           0 NONE
列的选择性(OWNER)DENSITY的值
SCOTT@fyl>SELECT 1/19 FROM DUAL;
      1/19
----------
.052631579

对于DENSITY列值
If a histogram is available on COLUMN_NAME, then this column displays the selectivity of a value that spans fewer than 2 endpoints in the histogram. It does not represent the selectivity of values that span 2 or more endpoints.
If a histogram is not available on COLUMN_NAME, then the value of this column is 1/NUM_DISTINCT.

生成直方图后
exec dbms_stats.gather_table_stats(ownname => 'SCOTT',tabname => 'T',estimate_percent => 100 ,method_opt => 'for all columns',cascade => true);

SCOTT@fyl>select TABLE_NAME,COLUMN_NAME,NUM_DISTINCT,trunc(DENSITY,3),NUM_NULLS,NUM_BUCKETS,HISTOGRAM from user_tab_col_statistics where table_name='T';
TABLE_NAME COLUMN_NAME                    NUM_DISTINCT TRUNC(DENSITY,3)  NUM_NULLS NUM_BUCKETS HISTOGRAM
---------- ------------------------------ ------------ ---------------- ---------- ----------- ---------------
T          OWNER                                    19                0          0          19 FREQUENCY
T          OBJECT_NAME                           38762                0          0          75 HEIGHT BALANCED
T          SUBOBJECT_NAME                           69             .002      65255          69 FREQUENCY
T          OBJECT_ID                             65494                0          0          75 HEIGHT BALANCED
T          DATA_OBJECT_ID                         5051                0      60404          75 HEIGHT BALANCED
T          OBJECT_TYPE                              42                0          0          42 FREQUENCY
T          CREATED                                1111                0          0          75 HEIGHT BALANCED
T          LAST_DDL_TIME                          1142                0          0          75 HEIGHT BALANCED
T          TIMESTAMP                              1218                0          0          75 HEIGHT BALANCED
T          STATUS                                    2                0          0           2 FREQUENCY
T          TEMPORARY                                 2                0          0           2 FREQUENCY
T          GENERATED                                 2                0          0           2 FREQUENCY
T          SECONDARY                                 2                0          0           2 FREQUENCY
T          NAMESPACE                                19                0          0          19 FREQUENCY
T          EDITION_NAME                              0                0      65494           0 NONE
三、直方图
当某列数据分布不均衡,为了让CBO能生成最佳的执行计划,我们可能需要对表收集直方图,直方图最大的桶数(Bucket)是254。
OLTP系统中没有必要千万不用收集直方图,因为OLTP一般会用绑定变量,但11g之前一直有绑定变量窥探的问题,导致收集的直方图没有作用;

HISTOGRAM表示直方图的类型:
FREQUECNCY频率直方图、 当列中Distinct_keys 较少(小于254),如果不手工指定<span style="font-family: Arial, Helvetica, sans-serif;">直方图桶数(BUCKET)</span>,Oracle就会自动的创建频率直方图,并且桶数(BUCKET)等于Distinct_Keys。
HEIGHT BALANCED 高度平衡直方图 当列中Distinct_keys大于254,如果不手工指定直方图桶数(BUCKET),Oracle就会自动的创建高度平衡直方图。(默认75)

在有直方图的情况下执行sql
SCOTT@fyl>select * from t where owner='SYS';
30530 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 30530 |  2921K|   262   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 30530 |  2921K|   262   (1)| 00:00:04 |
--------------------------------------------------------------------------

SCOTT@fyl>select * from t where owner='SCOTT';  
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3630032853

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |    10 |   980 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T           |    10 |   980 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T_OWNER |    10 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
owner=SYS时查询30530 估算Rows30530 全表扫描  正确
owner-SCOTT时查询10  估算Rows10  索引     正确

去掉直方图(method_opt => 'for all columns  size 1')
SCOTT@fyl>select * from t where owner='SYS';  
30530 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3630032853
-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |  3447 |   329K|   102   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T           |  3447 |   329K|   102   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | IND_T_OWNER |  3447 |       |     9   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

SCOTT@fyl>select * from t where owner='SCOTT';
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3630032853
-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |  3447 |   329K|   102   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T           |  3447 |   329K|   102   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | IND_T_OWNER |  3447 |       |     9   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
owner=SYS时  查询30530 估算Rows3447  <span style="font-family: Arial, Helvetica, sans-serif;">NUM_ROWS / </span><span style="font-family: Arial, Helvetica, sans-serif;">NUM_DISTINCT=NUM_ROWS*DENSITY=69454/19=3447  错误</span>
owner=SCOTT时  查询10  估算Rows3447  <span style="font-family: Arial, Helvetica, sans-serif;">NUM_ROWS </span><span style="font-family: Arial, Helvetica, sans-serif;">/ NUM_DISTINCT=69454/19=3447  错误</span>
直方图告诉CBO这个列是不是均衡的 
1. 没收集直方图 ---CBO认为这个列是分布均匀的;
2. 收集过了 ---告诉CBO这个列数据有问题 分布不均衡  你别算错了 特别是频率直方图算的会很准最终就是影响rows
user_tab_histograms视图可以查询直方图桶数(BUCKET)与数值分布情况

四、聚簇因子
1、堆表的存储方式
Oralce 数据库系统中最普通,最为常用的即为堆表。堆表的数据存储方式为无序存储,也就是任意的DML操作都可能使得当前数据块存在可用的空闲空间。处于节省空间的考虑,块上的可用空闲空间会被新插入的行填充,而不是按顺序填充到最后被使用的块上。上述的操作方式导致了数据的无序性的产生。当创建索引时,会根据指定的列按顺序来填充到索引块,缺省的情况下为升序。新建或重建索引时,索引列上的顺序是有序的,而表上的顺序是无序的,也就是存在了差异,即表现为聚簇因子。

2、什么是聚簇因子(clustering factor/CF)
聚簇因子是基于表上索引列上的一个值,每一个索引都有一个聚簇因子。用于描述索引块上与表块上存储数据在顺序上的相似程度,也就说表上的数据行的存储顺序与索引列上顺序是否一致。在全索引扫描中,CF的值基本上等同于物理I/O或块访问数,如果相同的块被连续读,则Oracle认为只需要1次物理I/O。好的CF值接近于表上的块数,而差的CF值则接近于表上的行数。聚簇因子在索引创建时就会通过表上存存在的行以及索引块计算获得。

3、Oracle 如何计算聚簇因子
执行或预估一次全索引扫描。检查索引块上每一个rowid的值,查看是否前一个rowid的值与后一个指向了相同的数据块,如果指向了不相同的数据块则CF的值增加1。当索引块上的每一个rowid被检查完毕,即得到最终的CF值。

如果ClusteringFactor接近于表存储的块数,说明这张表是按照索引字段的顺序存储的。
如果ClusteringFactor接近于行的数量,那说明这张表不是按索引字段顺序存储的。

实验
新建表
SQL> create table test as select * from emp;
SQL> commit;
新建索引
SQL> create index test_enmno_ind on test (empno);
SQL> select * from test where empno=7788;

Execution Plan
----------------------------------------------------------
Plan hash value: 845540399
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |     1 |    32 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST           |     1 |    32 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | TEST_ENMNO_IND |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("EMPNO"=7788)
分析表
SQL> analyze table test compute statistics;
查看列值聚簇因子
select t.table_name,t.NUM_ROWS,t.BLOCKS,t.AVG_ROW_LEN,i.index_name,i.CLUSTERING_FACTOR
from dba_tables t,dba_indexes i
where t.table_name=i.table_name
and t.owner='SCOTT'
and t.table_name='TEST';

TABLE_NAME   NUM_ROWS   BLOCKS    AVG_ROW_LEN  INDEX_NAME          CLUSTERING_FACTOR
----------   -----      -------    ----------- ------------------- -----------------
TEST           14        4          40          TEST_ENAME_IND          1
 
14行*40字节=560字节,一个块8192个字节,560/8192=0.0684;一个test表占一个块的6.84%
SQL> alter table test pctfree 93; 
SQL> insert into test select * from emp;
14 rows created.
SQL> /            ----每插一个emp表,都占一个块
14 rows created.
SQL> /    --查10个
SQL> commit;
Commit complete.

SQL> analyze table test compute statistics;
Table analyzed.
SQL> select * from test where empno=7788;
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 217508114
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    10 |   320 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST |    10 |   320 |     5   (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("EMPNO"=7788)

SQL> select t.table_name,t.NUM_ROWS,t.BLOCKS,t.AVG_ROW_LEN,i.index_name,i.CLUSTERING_FACTOR
  2  from dba_tables t,dba_indexes i
  3  where t.table_name=i.table_name
  4  and t.owner='SCOTT'
  5  and t.table_name='TEST';

TABLE_NAME   NUM_ROWS     BLOCKS AVG_ROW_LEN INDEX_NAME                     CLUSTERING_FACTOR
------------ ---------- ---------- ----------- ------------------------------ -----------------
TEST         140         16          40 TEST_ENMNO_IND                               140

解决办法:
SQL> create table test_temp as select * from test;
SQL> truncate table test;
SQL> insert into test select * from test_temp order by empno;
SQL> alter index rebuild online;
SQL> drop table test_temp;
SQL> commit;
再次查看:
SQL> select * from test where empno=7788;
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 845540399
----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |    10 |   320 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST           |    10 |   320 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | TEST_ENMNO_IND |    10 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("EMPNO"=7788)

SQL> /
TABLE_NAME    NUM_ROWS     BLOCKS AVG_ROW_LEN INDEX_NAME       CLUSTERING_FACTOR
------------ ---------- ---------- ----------- --------------- -----------------
TEST           140         13          40 TEST_ENMNO_IND        16


直方图:
SQL> select * from test where rownum < 10;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20

SQL> begin 
  2   for i in 1 .. 10000 loop
  3     insert into test values (7369,'SMITH','CLERK',7902,sysdate,800,1000,20);
  4   end loop;
  5  commit;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> analyze table test compute statistics;
Table analyzed.
查询数据很大,但是还走索引
SQL> select * from test where empno=7369;
10010 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 845540399
----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |   724 | 22444 |    60   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST           |   724 | 22444 |    60   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | TEST_ENMNO_IND |   724 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("EMPNO"=7369)


插数据的时候很集中,所以聚簇因子与块数接近
SQL> /
TABLE_NAME   NUM_ROWS     BLOCKS   AVG_ROW_LEN INDEX_NAME         CLUSTERING_FACTOR
------------ ---------- ---------- ----------- ---------------- -----------------
TEST         10140        810          40 TEST_ENMNO_IND         787

解决办法:
SQL> analyze table test compute statistics for columns empno;


SQL> select * from test where empno=7369;
10010 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 217508114
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10010 |   303K|   182   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| TEST | 10010 |   303K|   182   (2)| 00:00:03 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("EMPNO"=7369)



 




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值