oracle实验记录 (oracle单表选择率与基数计算(1))

最新推荐文章于 2024-09-14 19:55:29 发布

cuiye1492

最新推荐文章于 2024-09-14 19:55:29 发布

阅读量163

点赞数

文章标签：数据库

单表选择率与基数(cardinality)有关系，也就是执行计划中的CARD 信息
CARD表示oracle 认为此操作将返回多少rows

基本的CARD 计算就等于 num_rows*选择率
CARD 对于otimizer 选初始的连接顺序，选INDEX 都有很大影响,所以能够看明白计算过程方法，对优化是很有帮助的

实验单表中选择率 (本次实验学习参考cost-based oracle)
SQL> create table t1(a int);

表已创建。

SQL> ed
已写入 file afiedt.buf

1 declare
2 begin
3 for i in 1..10000 loop
4 insert into t1 values(i);
5 end loop;
6 commit;
7* end;
SQL> /

PL/SQL 过程已成功完成。

SQL> execute dbms_stats.gather_table_stats('SYS','T1');

PL/SQL 过程已成功完成。

SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T1';

TABLE_NAME DENSITY NUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T1 .0001 0 10000

SQL> set autotrace traceonly explain
SQL> select count(*) from t1;

执行计划
----------------------------------------------------------
Plan hash value: 3724264953

-------------------------------------------------------------------
| Id | Operation          | Name | Rows | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |     6   (0)| 00:00:01 |
-------------------------------------------------------------------

Note
-----
- dynamic sampling used for this statement

~没有谓词没有过滤扫描所有行所以card=10000

SQL> select count(*) from t1 where a=2;

执行计划
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T1   |     1 |     3 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=2)

card=num_row*1/num_distinct=1
SQL> select 10000/10000 from dual;

10000/10000
-----------
1
如果要是这个表存在HISTOGRAM 那么，OPTIMIZER将使用DENSITY 列，CARD=NUM_rows*density

以上的如果谓词就是COLUMN=字面值那么 card=num_row/num_distinct (=num_row*1/num_distinct)如果有histogram card=NUM_rows*density

~~~~~~~~~~~~~~~~~~~~~~~~~~没有统计信息，使用动态采集
SQL> create table t2 (a int);

表已创建。

SQL> declare
2 begin
3 for i in 1..10000 loop
4 insert into t2 values(i);
5 end loop;
6 commmit;
7 end;
8 /
commmit;

SQL> ed
已写入 file afiedt.buf

1 declare
2 begin
3 for i in 1..10000 loop
4 insert into t2 values(i);
5 end loop;
6 commit;
7* end;
8 /

PL/SQL 过程已成功完成。

SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T2';~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~没有统计信息

未选定行

SQL> show parameter dyna

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling integer 2~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~oracle将使用动态采集

SQL> select count(*) from t2;

执行计划
----------------------------------------------------------
Plan hash value: 3321871023

-------------------------------------------------------------------
| Id | Operation          | Name | Rows | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| T2   | 10000 |     6   (0)| 00:00:01 |
-------------------------------------------------------------------

Note
-----
- dynamic sampling used for this statement

SQL> select count(*) from t2 where a=2;

执行计划
----------------------------------------------------------
Plan hash value: 3321871023

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    13 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |    13 |            |          |
|* 2 |   TABLE ACCESS FULL| T2   |     1 |    13 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=2)~~~~~~~~

Note
-----
- dynamic sampling used for this statement~~~~~~~~~~~~~~~~~~~~~可以看到 ORACLE 利用动态采集的信息也是很准确的

Dynamic Sampling Levels(以下是动态采集各个LEVEL 说明)
The sampling levels are as follows if the dynamic sampling level used is from a cursor hint or from the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter:

Level 0: Do not use dynamic sampling.

Level 1: Sample all tables that have not been analyzed if the following criteria are met: (1) there is at least 1 unanalyzed table in the query; (2) this

unanalyzed table is joined to another table or appears in a subquery or non-mergeable view; (3) this unanalyzed table has no indexes; (4) this unanalyzed

table has more blocks than the number of blocks that would be used for dynamic sampling of this table. The number of blocks sampled is the default number of

dynamic sampling blocks (32).

Level 2: Apply dynamic sampling to all unanalyzed tables. The number of blocks sampled is two times the default number of dynamic sampling blocks.

Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selectivity estimation used a guess for some

predicate that is a potential dynamic sampling predicate. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed

tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.

Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns.

The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default

number of dynamic sampling blocks.

Levels 5, 6, 7, 8, and 9: Apply dynamic sampling to all tables that meet the previous level criteria using 2, 4, 8, 32, or 128 times the default number of

dynamic sampling blocks respectively.

Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.

The sampling levels are as follows if the dynamic sampling level for a table is set using the DYNAMIC_SAMPLING optimizer hint:

Level 0: Do not use dynamic sampling.

Level 1: The number of blocks sampled is the default number of dynamic sampling blocks (32).

Levels 2, 3, 4, 5, 6, 7, 8, and 9: The number of blocks sampled is 2, 4, 8, 16, 32, 64, 128, or 256 times the default number of dynamic sampling blocks

respectively.

Level 10: Read all blocks in the table.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SQL> alter system set optimizer_dynamic_sampling=1;
系统已更改。
SQL> alter system set optimizer_dynamic_sampling=1; ~~~~关闭动态采集

系统已更改。
SSQL> set autotrace traceonly explain
SQL> alter session set events '10053 trace name context forever';

会话已更改。

SQL> select count(*) from t2;

执行计划
----------------------------------------------------------
Plan hash value: 3321871023

-------------------------------------------------------------------
| Id | Operation          | Name | Rows | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     2   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| T2   |    82 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------

SQL> select count(*) from t2 where a=2;

执行计划
----------------------------------------------------------
Plan hash value: 3321871023

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    13 |     2   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |    13 |            |          |
|* 2 |   TABLE ACCESS FULL| T2   |     1 |    13 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=2)

~~从trace中可以看到信息十分不准确
Table Stats::
Table: T2 Alias: T2 (NOT ANALYZED)
    #Rows: 82 #Blks: 1 AvgRowLen: 100.00
***************************************
SINGLE TABLE ACCESS PATH
Table: T2 Alias: T2
    Card: Original: 82 Rounded: 82 Computed: 82.00 Non Adjusted: 82.00
Access Path: TableScan
    Cost: 2.00 Resp: 2.00 Degree: 0
      Cost_io: 2.00 Cost_cpu: 19421
      Resp_io: 2.00 Resp_cpu: 19421
Best:: AccessPath: TableScan
         Cost: 2.00 Degree: 1 Resp: 2.00 Card: 82.00 Bytes: 0

Table Stats::
Table: T2 Alias: T2 (NOT ANALYZED) ~~~~~~~~~~~没统计信息(按公式算的话oracle 自己决定num_distinct 为100)
    #Rows: 82 #Blks: 1 AvgRowLen: 100.00
***************************************
SINGLE TABLE ACCESS PATH
Column (#1): A(NUMBER) NO STATISTICS (using defaults)
    AvgLen: 22.00 NDV: 3 Nulls: 0 Density: 0.39024
Table: T2 Alias: T2
    Card: Original: 82 Rounded: 1 Computed: 0.82 Non Adjusted: 0.82
Access Path: TableScan
    Cost: 2.00 Resp: 2.00 Degree: 0
      Cost_io: 2.00 Cost_cpu: 23521
      Resp_io: 2.00 Resp_cpu: 23521
Best:: AccessPath: TableScan
         Cost: 2.00 Degree: 1 Resp: 2.00 Card: 0.82 Bytes: 0
````````````````````````````````````````````````````````````````````
一个问题
关于列中值严重不均匀

create table t3(a int)
SQL> ed
已写入 file afiedt.buf

1 declare
2 begin
3 for i in 1..1000 loop
4 insert into t3 values(1);
5 end loop;
6 for i in 2..9001 loop
7 insert into t3 values(i);
8 end loop;
9 commit;
10* end;
11 /

PL/SQL 过程已成功完成。

SQL> execute dbms_stats.gather_table_stats('SYS','T3');

PL/SQL 过程已成功完成。

SQL> select count(*) from t3;

COUNT(*)
----------
10000

SQL> select num_rows from user_tables where table_name='T3';

NUM_ROWS
----------
10000

SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T3';

TABLE_NAME DENSITY NUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T3 .000111099 0 9001

SQL> select count(*) from t3 where a=1;

COUNT(*)
----------
1000

从上面信息可以看到 ~~列A 有10000ROWS 其中A=1的 1000 ROWS， A=2 到9001 的有9000 ROW
这样NUM_DISTINCT=9001

SQL> set autotrace traceonly explain
SQL> alter session set events '10053 trace name context forever';

会话已更改。
SQL> select count(*) from t3 where a=1;

执行计划
----------------------------------------------------------
Plan hash value: 463314188

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     7   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T3   |     1 |     3 |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1)

可以看到CARD 非常不准确还是按公式 num_rows*1/num_distinct算的
SQL> select 10000*1/9001 from dual;

10000*1/9001
------------
1.11098767

***********************
Table Stats::
Table: T3 Alias: T3
    #Rows: 10000 #Blks: 23 AvgRowLen: 3.00
***************************************
SINGLE TABLE ACCESS PATH
Column (#1): A(NUMBER)
    AvgLen: 4.00 NDV: 9001 Nulls: 0 Density: 1.1110e-004 Min: 1 Max: 9001
Table: T3 Alias: T3
    Card: Original: 10000 Rounded: 1 Computed: 1.11 Non Adjusted: 1.11
Access Path: TableScan
    Cost: 7.14 Resp: 7.14 Degree: 0
      Cost_io: 7.00 Cost_cpu: 2163793
      Resp_io: 7.00 Resp_cpu: 2163793
Best:: AccessPath: TableScan
         Cost: 7.14 Degree: 1 Resp: 7.14 Card: 1.11 Bytes: 0~~~~~~~~~~~~~~~~~~看到trace 中card =1.11 ，实际应该为1000
我们可以用HISTOGREAM来解决

1 begin
2     dbms_stats.gather_table_stats(
3             'SYS',
4             't3',
5             cascade => true,
6             estimate_percent => null,
7             method_opt => 'for all columns size 120'
8     );
9* end;
SQL> /

PL/SQL 过程已成功完成。

使用HISTOGRAM收集统计信息后 ORACLE 将使用DENSITY 来计算

SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T3';

TABLE_NAME DENSITY NUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T3 .0001 0 9001

SQL> set autotrace traceonly explain
SQL> alter session set events '10053 trace name context forever';

会话已更改。

SQL> select count(*) from t3 where a=1;

执行计划
----------------------------------------------------------
Plan hash value: 463314188

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     7   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T3   |   917 | 2751 |     7   (0)| 00:00:01 |~~~~~~~~~~可以看到CARD接近了=917
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1)

SQL> select count(*) from t3 where a=2;

执行计划
----------------------------------------------------------
Plan hash value: 463314188

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     7   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T3   |     1 |     3 |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=2)

SQL>
上面可以看出 A=1时候CARD=917 ORACLE 并没有使用density来计算， A=2时候用的DENSITY计算的 card=10000*0.0001=1
针对a=1 oracle 使用的 histogram bucket来计算的

SQL> select endpoint_number,endpoint_value from dba_histograms where table_name=
'T3' and rownum<5;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
             11              1
             12              9
             13             93
             14            177

可以看出histogram是一个高度均衡的（HISTOGRAM分2种第一中频率histogram 每个bucket存一个值当列的histogram bucket >=列distinct时候为频率，另一种高度均衡

HISTOGRAM
当列DISTINCT比 histogram bucket多时为高度均衡每个bucket存相同数量的值）
此例为高度均衡可以看出来 1-11 BUCKET 存值1，其实第12个BUCKET中也存值1，因为12-13中可以看出每个BUCKET 存 84个值(93-9)（此例中除值1外其他都唯一，所以很好看

出来），而第12个 BUCKET 最高存的值为9 这样表示第12个BUCKET还存了值1（应该是76 =84-8 ，8是2到9的个数） ,oracle 发现值1 是一个高频率出现的值（跨越的多个

桶），oracle将采用BUCKET计算11/120（跨越bucket数/总bucket数）=.091666667 从这里看到了虽然BUCKET12中也有值1 但ORACLE没算进来（因为该bucket中还有其它值2-8）用

11/120 这也就造成了CARD只是接近而不是准确的1000
关于histogram 后面将会有与histogram详细的实验~~
card=10000*.091666667= 916.66667
看下trace

SINGLE TABLE ACCESS PATH
Column (#1): A(NUMBER)
    AvgLen: 4.00 NDV: 9001 Nulls: 0 Density: 1.0000e-004 Min: 1 Max: 9001
    Histogram: HtBal #Bkts: 120 UncompBkts: 120 EndPtVals: 110
Table: T3 Alias: T3
    Card: Original: 10000 Rounded: 917 Computed: 916.67 Non Adjusted: 916.67
Access Path: TableScan
    Cost: 7.14 Resp: 7.14 Degree: 0
      Cost_io: 7.00 Cost_cpu: 2163793
      Resp_io: 7.00 Resp_cpu: 2163793
Best:: AccessPath: TableScan
         Cost: 7.14 Degree: 1 Resp: 7.14 Card: 916.67 Bytes: 0 ************ Card: 916.67
***************************************

~列中有NULL的情况

SQL> create table t6 (a int);

表已创建。

1 declare
2    begin
3     for i in 1..1000 loop
4      insert into t6 values(1);
5    end loop;
6     for i in 2..9001 loop
7     insert into t6 values(i);
8     end loop;
9     commit;
10* end;
11 /

PL/SQL 过程已成功完成。

SQL> update t6 set a=null where a>8001;

已更新1000行。

SQL> commit;

提交完成。

SQL> execute dbms_stats.gather_table_stats('SYS','T6');

PL/SQL 过程已成功完成。

SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T6';

TABLE_NAME DENSITY NUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T6 .000124984 1000 8001

SQL> select num_rows from user_tables where table_name='T6';

NUM_ROWS
----------
10000
可以看出工10000ROWS 空值为1000

SQL> set autotrace traceonly explain
SQL> alter session set events '10053 trace name context forever';

会话已更改。

SQL> select count(*) from t6 where a=2;

执行计划
----------------------------------------------------------
Plan hash value: 4096694858

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T6   |     1 |     3 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=2)

包含NULL的选择率按以下公式算
Adjusted selectivity = Base selectivity * (num_rows - num_nulls) / num_rows

=1/8001*((10000-1000)/10000)

card = NUM_ROWS*Adjusted selectivity=10000*(/8001*((10000-1000)/10000))

SQL> select 1/8001*((10000-1000)/10000) from dual;

1/8001*((10000-1000)/10000)
---------------------------
.000112486

SQL> select 10000*0.000112486 from dual;

10000*0.000112486
-----------------
1.12486

SINGLE TABLE ACCESS PATH
Column (#1): A(NUMBER)
    AvgLen: 4.00 NDV: 8001 Nulls: 1000 Density: 1.2498e-004 Min: 1 Max: 8001
Table: T6 Alias: T6
    Card: Original: 10000 Rounded: 1 Computed: 1.12 Non Adjusted: 1.12
Access Path: TableScan
    Cost: 6.14 Resp: 6.14 Degree: 0
      Cost_io: 6.00 Cost_cpu: 2142429
      Resp_io: 6.00 Resp_cpu: 2142429
Best:: AccessPath: TableScan
         Cost: 6.14 Degree: 1 Resp: 6.14 Card: 1.12 Bytes: 0====================card 1.12

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

关于使用BIND
SQL> set autotrace traceonly explain
SQL> set autotrace off
SQL> variable a number
SQL> set autotrace traceonly explain
SQL> select count(*) from t6 where a=:a;*********************

执行计划
----------------------------------------------------------
Plan hash value: 4096694858

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T6   |     1 |     3 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=TO_NUMBER(:A))

对于BIND 所使用的选择率= COLUMN=字面值时候的选择率（字面值情况也要分是否列中有NULL来计算，此例中列中有NULL）
SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T6';

TABLE_NAME DENSITY NUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T6 .000124984 1000 8001~~~~~~~~~~~~~~~~~包含NULL

SQL> select num_rows from user_tables where table_name='T6';

NUM_ROWS
----------
10000

Adjusted selectivity = Base selectivity * (num_rows - num_nulls) / num_rows~~~~~~~~~~~~~~~调整选择率

=1/8001*((10000-1000)/10000)
BIN选择率=Adjusted selectivity（是字面值时候情况的选择率）
card = NUM_ROWS*Adjusted selectivity=10000*(/8001*((10000-1000)/10000))

SQL> select 1/8001*((10000-1000)/10000) from dual;

1/8001*((10000-1000)/10000)
---------------------------
.000112486

SQL> select 10000*0.000112486 from dual;

10000*0.000112486
-----------------
          1.12486
trace
SINGLE TABLE ACCESS PATH
Column (#1): A(NUMBER)
    AvgLen: 4.00 NDV: 8001 Nulls: 1000 Density: 1.2498e-004 Min: 1 Max: 8001
Table: T6 Alias: T6
    Card: Original: 10000 Rounded: 1 Computed: 1.12 Non Adjusted: 1.12
Access Path: TableScan
    Cost: 6.21 Resp: 6.21 Degree: 0
      Cost_io: 6.00 Cost_cpu: 3142429
      Resp_io: 6.00 Resp_cpu: 3142429
Best:: AccessPath: TableScan
         Cost: 6.21 Degree: 1 Resp: 6.21 Card: 1.12 Bytes: 0
***************************************

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
使用INLIST
SQL> create table t4 (a int);

表已创建。

SQL> declare
2 begin
3 for i in 1..10000 loop
4 insert into t4 values(i);
5 end loop;
6 commit;
7 end;
8 /

PL/SQL 过程已成功完成。

SQL> execute dbms_stats.gather_table_stats('SYS','T4');

PL/SQL 过程已成功完成。

SQL> select num_rows from user_tables where table_name='T4';

NUM_ROWS
----------
10000

SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T4';

TABLE_NAME DENSITY NUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T4 .0001 0 10000
SQL> col table_name format a10
SQL> select table_name,density,num_nulls,num_distinct,HIGH_VALUE from user_tab_c
ol_statistics where table_name='T4';

TABLE_NAME DENSITY NUM_NULLS NUM_DISTINCT HIGH_VALUE
---------- ---------- ---------- ------------ --------------------
T4 .0001 0 10000 C302

SQL> variable a number
SQL> variable b number
SQL> set autotrace traceonly explain
SQL> alter session set events '10053 trace name context forever';

会话已更改。

SQL> select count(*) from t4 where a in (1,2);~~~~~~~~~2个

执行计划
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   |     2 |     6 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1 OR "A"=2)

SQL> select count(*) from t4 where a in (1,2,2);~~~~~~~~~~~~含重复值得

执行计划
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   |     2 |     6 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1 OR "A"=2)

SQL> select count(*) from t4 where a in (1,2,3);~~~~~~~~~~3个

执行计划
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   |     3 |     9 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1 OR "A"=2 OR "A"=3)

SQL> select count(*) from t4 where a in (1,2,2,null);~~~~~~~包含空值

执行计划
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   |     3 |     9 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1 OR "A"=2 OR "A"=TO_NUMBER(NULL))

SQL> select count(*) from t4 where a in (1,100052);~~~~~~~~~超过该列的最大值（HIGHT_VALUE）

执行计划
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   |     2 |     6 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1 OR "A"=100052)

SQL> select count(*) from t4 where a in (:a,:b);~~~~~~~~~~使用变量

执行计划
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   |     2 |     6 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=TO_NUMBER(:A) OR "A"=TO_NUMBER(:B))

Moreover, when you check the values for 9i and 10g, you see
that the cardinalities always come out as N * number of entries in the list until the number of
entries in the list exceeds the number of distinct values
结论：关于inlist card= n*(num_rows*1/num_distinct) (N 为中 inlist (数目 ))
oracle可以判断出重复值，~~检查不出是否超出最大值，是否含NULL

另外INLIST （N，N）中N的数目超过列的Num_distinct数据那么 card=num_distinct*(num_rows*1/num_distinct)=num_rows
早期8I的 IN LIST 计算是按OR算的
in (1,2,3)= 1 or 2 or 3 8I 是这么算的选择率
sel(A or B or C) =
sel(A) + sel(B) + sel(C) – Sel(A)sel(B) – Sel(B)sel(C) – sel(C)sel(A) + Sel(A)Sel(B)Sel(C)

8I INLIST CARD=NUM_ROWS*1/(sel(A) + sel(B) + sel(C) – Sel(A)sel(B) – Sel(B)sel(C) – sel(C)sel(A) + Sel(A)Sel(B)Sel(C))****

~~~~~~~~~~~~~~~~~~~~~~~NOT IN
SQL> select count(*) from t4 where a in (1,2,2);用HINTS USE_CONCAT那么可以将 OR操作转换为 UNION ALL 操作(3个SELECT FROM X WHERE =1 union all=2 union all=3)

执行计划
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   |     2 |     6 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"=1 OR "A"=2)

SQL> select count(*) from t4 where a not in (1,2);

执行计划
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   | 9998 | 29994 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"<>1 AND "A"<>2)

SQL> select count(*) from t4 where a not in (1,2,2);

执行计划
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   | 9998 | 29994 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"<>1 AND "A"<>2)

SQL> select count(*) from t4 where a not in (1,2,3);

执行计划
----------------------------------------------------------
Plan hash value: 405148644

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T4   | 9997 | 29991 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("A"<>1 AND "A"<>2 AND "A"<>3)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~可以看出是自相容的 card=NUM_ROWS-n*(num_rows*1/num_distinct)

关于like*************

SQL> set autotrace traceonly explain
SQL> select count(*) from t6 where a like '%1';

执行计划
----------------------------------------------------------
Plan hash value: 4096694858

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T6   |   500 | 1500 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(TO_CHAR("A") LIKE '%1')

SQL> select count(*) from t6 where a like '1%';

执行计划
----------------------------------------------------------
Plan hash value: 4096694858

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T6   |   500 | 1500 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(TO_CHAR("A") LIKE '1%')

SQL> select count(*) from t6 where a like '1%';

执行计划
----------------------------------------------------------
Plan hash value: 4096694858

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
-----------ss----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     6   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     3 |            |          |
|* 2 |   TABLE ACCESS FULL| T6   |   500 | 1500 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(TO_CHAR("A") LIKE '1%')

SQL> set autotrace off
SQL> select count(*) from t6 where a like '1%';

COUNT(*)
----------
2110

SQL>

选择率应该是5% CARD=NUM_ROW*5%=500

另外 LIKE %a 是不走INDEX 的，而 like a% 是走index的针对 LIKE%A 不走 , 可以建一个reverse(X) 的函数index 查询时候使用reverse(XX) like '%a' 可以走这个函数index

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/12020513/viewspace-614301/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/12020513/viewspace-614301/