<pre name="code" class="sql">一、基数:
一个列中唯一键(Distinct_keys)的个数,如有一个100W行的表,性别列的基数为2 (select distinct xx from t),主键列的基数为100W(select distinct id from t);
二、选择性:
基数/总行数所占的百分比,性别 2/100w * 100% 主键 100% 选择性越高 越有利于使用索引20~30%就算是比较高了
select a.column_name,
b.num_rows,
a.num_distinct Cardinality,
round(a.num_distinct/b.num_rows * 100, 2) selectivity
from dba_tab_col_statistics a, dba_tables b
where a.owner = b.owner
and a.table_name = b.table_name
and a.owner = upper('&owner')
and a.table_name = upper('&table_name')
如何查看列的选择性和基数呢?
实验:(linux——11g)
<pre name="code" class="sql">create table t as select * from dba_objects;
create index ind_t_id on t(object_id);
create index ind_t_name on t(object_name);
create index ind_t_owner on t(object_owner);
列的基数
SCOTT@fyl>select count(distinct object_id),count(distinct owner) from t;
COUNT(DISTINCTOBJECT_ID) COUNT(DISTINCTOWNER)
------------------------ --------------------
65494 19
列的统计信息
SCOTT@fyl>select TABLE_NAME,COLUMN_NAME,NUM_DISTINCT,trunc(DENSITY,3),NUM_NULLS,NUM_BUCKETS,HISTOGRAM from user_tab_col_statistics where table_name='T';
TABLE_NAME COLUMN_NAME NUM_DISTINCT TRUNC(DENSITY,3) NUM_NULLS NUM_BUCKETS HISTOGRAM
---------- ------------------------------ ------------ ---------------- ---------- ----------- ---------------
T OWNER 19 .052 0 1 NONE
T OBJECT_NAME 38000 0 0 1 NONE
T SUBOBJECT_NAME 69 .014 65255 1 NONE
T OBJECT_ID 65494 0 0 1 NONE
T DATA_OBJECT_ID 5051 0 60404 1 NONE
T OBJECT_TYPE 42 .023 0 1 NONE
T CREATED 1111 0 0 1 NONE
T LAST_DDL_TIME 1142 0 0 1 NONE
T TIMESTAMP 1218 0 0 1 NONE
T STATUS 2 .5 0 1 NONE
T TEMPORARY 2 .5 0 1 NONE
T GENERATED 2 .5 0 1 NONE
T SECONDARY 2 .5 0 1 NONE
T NAMESPACE 19 .052 0 1 NONE
T EDITION_NAME 0 0 65494 0 NONE
列的选择性(OWNER)DENSITY的值
SCOTT@fyl>SELECT 1/19 FROM DUAL;
1/19
----------
.052631579
对于DENSITY列值
If a histogram is available on COLUMN_NAME, then this column displays the selectivity of a value that spans fewer than 2 endpoints in the histogram. It does not represent the selectivity of values that span 2 or more endpoints.
If a histogram is not available on COLUMN_NAME, then the value of this column is 1/NUM_DISTINCT.
生成直方图后
exec dbms_stats.gather_table_stats(ownname => 'SCOTT',tabname => 'T',estimate_percent => 100 ,method_opt => 'for all columns',cascade => true);
SCOTT@fyl>select TABLE_NAME,COLUMN_NAME,NUM_DISTINCT,trunc(DENSITY,3),NUM_NULLS,NUM_BUCKETS,HISTOGRAM from user_tab_col_statistics where table_name='T';
TABLE_NAME COLUMN_NAME NUM_DISTINCT TRUNC(DENSITY,3) NUM_NULLS NUM_BUCKETS HISTOGRAM
---------- ------------------------------ ------------ ---------------- ---------- ----------- ---------------
T OWNER 19 0 0 19 FREQUENCY
T OBJECT_NAME 38762 0 0 75 HEIGHT BALANCED
T SUBOBJECT_NAME 69 .002 65255 69 FREQUENCY
T OBJECT_ID 65494 0 0 75 HEIGHT BALANCED
T DATA_OBJECT_ID 5051 0 60404 75 HEIGHT BALANCED
T OBJECT_TYPE 42 0 0 42 FREQUENCY
T CREATED 1111 0 0 75 HEIGHT BALANCED
T LAST_DDL_TIME 1142 0 0 75 HEIGHT BALANCED
T TIMESTAMP 1218 0 0 75 HEIGHT BALANCED
T STATUS 2 0 0 2 FREQUENCY
T TEMPORARY 2 0 0 2 FREQUENCY
T GENERATED 2 0 0 2 FREQUENCY
T SECONDARY 2 0 0 2 FREQUENCY
T NAMESPACE 19 0 0 19 FREQUENCY
T EDITION_NAME 0 0 65494 0 NONE
三、直方图
当某列数据分布不均衡,为了让CBO能生成最佳的执行计划,我们可能需要对表收集直方图,直方图最大的桶数(Bucket)是254。
OLTP系统中没有必要千万不用收集直方图,因为OLTP一般会用绑定变量,但11g之前一直有绑定变量窥探的问题,导致收集的直方图没有作用;
HISTOGRAM表示直方图的类型:
FREQUECNCY频率直方图、 当列中Distinct_keys 较少(小于254),如果不手工指定<span style="font-family: Arial, Helvetica, sans-serif;">直方图桶数(BUCKET)</span>,Oracle就会自动的创建频率直方图,并且桶数(BUCKET)等于Distinct_Keys。
HEIGHT BALANCED 高度平衡直方图 当列中Distinct_keys大于254,如果不手工指定直方图桶数(BUCKET),Oracle就会自动的创建高度平衡直方图。(默认75)
在有直方图的情况下执行sql
SCOTT@fyl>select * from t where owner='SYS';
30530 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 30530 | 2921K| 262 (1)| 00:00:04 |
|* 1 | TABLE ACCESS FULL| T | 30530 | 2921K| 262 (1)| 00:00:04 |
--------------------------------------------------------------------------
SCOTT@fyl>select * from t where owner='SCOTT';
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3630032853
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 980 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 10 | 980 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_T_OWNER | 10 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
owner=SYS时查询30530 估算Rows30530 全表扫描 正确
owner-SCOTT时查询10 估算Rows10 索引 正确
去掉直方图(method_opt => 'for all columns size 1')
SCOTT@fyl>select * from t where owner='SYS';
30530 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3630032853
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3447 | 329K| 102 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 3447 | 329K| 102 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | IND_T_OWNER | 3447 | | 9 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
SCOTT@fyl>select * from t where owner='SCOTT';
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3630032853
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3447 | 329K| 102 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 3447 | 329K| 102 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | IND_T_OWNER | 3447 | | 9 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
owner=SYS时 查询30530 估算Rows3447 <span style="font-family: Arial, Helvetica, sans-serif;">NUM_ROWS / </span><span style="font-family: Arial, Helvetica, sans-serif;">NUM_DISTINCT=NUM_ROWS*DENSITY=69454/19=3447 错误</span>
owner=SCOTT时 查询10 估算Rows3447 <span style="font-family: Arial, Helvetica, sans-serif;">NUM_ROWS </span><span style="font-family: Arial, Helvetica, sans-serif;">/ NUM_DISTINCT=69454/19=3447 错误</span>
直方图告诉CBO这个列是不是均衡的
1. 没收集直方图 ---CBO认为这个列是分布均匀的;
2. 收集过了 ---告诉CBO这个列数据有问题 分布不均衡 你别算错了 特别是频率直方图算的会很准最终就是影响rows
user_tab_histograms视图可以查询直方图桶数(BUCKET)与数值分布情况
四、聚簇因子
1、堆表的存储方式
Oralce 数据库系统中最普通,最为常用的即为堆表。堆表的数据存储方式为无序存储,也就是任意的DML操作都可能使得当前数据块存在可用的空闲空间。处于节省空间的考虑,块上的可用空闲空间会被新插入的行填充,而不是按顺序填充到最后被使用的块上。上述的操作方式导致了数据的无序性的产生。当创建索引时,会根据指定的列按顺序来填充到索引块,缺省的情况下为升序。新建或重建索引时,索引列上的顺序是有序的,而表上的顺序是无序的,也就是存在了差异,即表现为聚簇因子。
2、什么是聚簇因子(clustering factor/CF)
聚簇因子是基于表上索引列上的一个值,每一个索引都有一个聚簇因子。用于描述索引块上与表块上存储数据在顺序上的相似程度,也就说表上的数据行的存储顺序与索引列上顺序是否一致。在全索引扫描中,CF的值基本上等同于物理I/O或块访问数,如果相同的块被连续读,则Oracle认为只需要1次物理I/O。好的CF值接近于表上的块数,而差的CF值则接近于表上的行数。聚簇因子在索引创建时就会通过表上存存在的行以及索引块计算获得。
3、Oracle 如何计算聚簇因子
执行或预估一次全索引扫描。检查索引块上每一个rowid的值,查看是否前一个rowid的值与后一个指向了相同的数据块,如果指向了不相同的数据块则CF的值增加1。当索引块上的每一个rowid被检查完毕,即得到最终的CF值。
如果ClusteringFactor接近于表存储的块数,说明这张表是按照索引字段的顺序存储的。
如果ClusteringFactor接近于行的数量,那说明这张表不是按索引字段顺序存储的。
实验
新建表
SQL> create table test as select * from emp;
SQL> commit;
新建索引
SQL> create index test_enmno_ind on test (empno);
SQL> select * from test where empno=7788;
Execution Plan
----------------------------------------------------------
Plan hash value: 845540399
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 32 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 32 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TEST_ENMNO_IND | 1 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMPNO"=7788)
分析表
SQL> analyze table test compute statistics;
查看列值聚簇因子
select t.table_name,t.NUM_ROWS,t.BLOCKS,t.AVG_ROW_LEN,i.index_name,i.CLUSTERING_FACTOR
from dba_tables t,dba_indexes i
where t.table_name=i.table_name
and t.owner='SCOTT'
and t.table_name='TEST';
TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN INDEX_NAME CLUSTERING_FACTOR
---------- ----- ------- ----------- ------------------- -----------------
TEST 14 4 40 TEST_ENAME_IND 1
14行*40字节=560字节,一个块8192个字节,560/8192=0.0684;一个test表占一个块的6.84%
SQL> alter table test pctfree 93;
SQL> insert into test select * from emp;
14 rows created.
SQL> / ----每插一个emp表,都占一个块
14 rows created.
SQL> / --查10个
SQL> commit;
Commit complete.
SQL> analyze table test compute statistics;
Table analyzed.
SQL> select * from test where empno=7788;
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 217508114
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 320 | 5 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST | 10 | 320 | 5 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("EMPNO"=7788)
SQL> select t.table_name,t.NUM_ROWS,t.BLOCKS,t.AVG_ROW_LEN,i.index_name,i.CLUSTERING_FACTOR
2 from dba_tables t,dba_indexes i
3 where t.table_name=i.table_name
4 and t.owner='SCOTT'
5 and t.table_name='TEST';
TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN INDEX_NAME CLUSTERING_FACTOR
------------ ---------- ---------- ----------- ------------------------------ -----------------
TEST 140 16 40 TEST_ENMNO_IND 140
解决办法:
SQL> create table test_temp as select * from test;
SQL> truncate table test;
SQL> insert into test select * from test_temp order by empno;
SQL> alter index rebuild online;
SQL> drop table test_temp;
SQL> commit;
再次查看:
SQL> select * from test where empno=7788;
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 845540399
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 320 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 10 | 320 | 3 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TEST_ENMNO_IND | 10 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMPNO"=7788)
SQL> /
TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN INDEX_NAME CLUSTERING_FACTOR
------------ ---------- ---------- ----------- --------------- -----------------
TEST 140 13 40 TEST_ENMNO_IND 16
直方图:
SQL> select * from test where rownum < 10;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7369 SMITH CLERK 7902 17-DEC-80 800 20
SQL> begin
2 for i in 1 .. 10000 loop
3 insert into test values (7369,'SMITH','CLERK',7902,sysdate,800,1000,20);
4 end loop;
5 commit;
6 end;
7 /
PL/SQL procedure successfully completed.
SQL> analyze table test compute statistics;
Table analyzed.
查询数据很大,但是还走索引
SQL> select * from test where empno=7369;
10010 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 845540399
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 724 | 22444 | 60 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 724 | 22444 | 60 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TEST_ENMNO_IND | 724 | | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMPNO"=7369)
插数据的时候很集中,所以聚簇因子与块数接近
SQL> /
TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN INDEX_NAME CLUSTERING_FACTOR
------------ ---------- ---------- ----------- ---------------- -----------------
TEST 10140 810 40 TEST_ENMNO_IND 787
解决办法:
SQL> analyze table test compute statistics for columns empno;
SQL> select * from test where empno=7369;
10010 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 217508114
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10010 | 303K| 182 (2)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| TEST | 10010 | 303K| 182 (2)| 00:00:03 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("EMPNO"=7369)