Oracle Frequency类型直方图对可选择率的影响

Oracle默认情况下认为表中列的数据时平均分布的,但实际上并不是如此,下面做个简单的简单的实验来看看Freqency直方图对可选择率的影响。Freqency类型的直方图时指收集表中列的统计信息时指定的Bucket大于等于表中列的DISTINCT值,否则收集到的是BANLACED 类型的直方图。

1、环境准备

**创建实验使用的表T1**
SCOTT@TNS_PDB01>create table t1(id number);

Table created.

SCOTT@TNS_PDB01>desc t1;
 Name							     Null?    Type
 ----------------------------------------------------------- -------- -----------------------------------------
 ID								      NUMBER

declare
        i number;
begin
        for i in 1 .. 3296 loop
                insert into t1 values(1);
        end loop;
        for i in 1 .. 100 loop
                insert into t1 values(3);
        end loop;
        for i in 1 .. 798 loop
                insert into t1 values(5);
        end loop;
        for i in 1 .. 3970 loop
                insert into t1 values(7);
        end loop;
        for i in 1 .. 16293 loop
                insert into t1 values(10);
        end loop;
        for i in 1 .. 3399 loop
                insert into t1 values(16);
        end loop;
        for i in 1 .. 3651 loop
                insert into t1 values(27);
        end loop;
        for i in 1 .. 3892 loop
                insert into t1 values(32);
        end loop;
        for i in 1 .. 3521 loop
                insert into t1 values(39);
        end loop;
        for i in 1 .. 1080 loop
                insert into t1 values(49);
        end loop;
        commit;
exception
        when others then
                dbms_output.put_line(sqlerrm);
end;
/

**可以看出表中数据不是均匀分布的**
SCOTT@TNS_PDB01>select id,count(id) from t1 group by id  order by id;

 ID  COUNT(ID)
--- ----------
  1	  3296
  3	   100
  5	   798
  7	  3970
 10	 16293
 16	  3399
 27	  3651
 32	  3892
 39	  3521
 49	  1080

10 rows selected.

2、不收集Freqency直方图信息的情况下收集统计信息


```sql
SCOTT@TNS_PDB01>exec dbms_stats.gather_table_stats(ownname => 'SCOTT',tabname => 'T1',estimate_percent => 100,method_opt => 'FOR COLUMNS ID SIZE 1',no_invalidate => false);

PL/SQL procedure successfully completed.

SCOTT@TNS_PDB01>select table_name,column_name,num_distinct,num_nulls,density,histogram from dba_tab_col_statistics where table_name = 'T1' and owner = 'SCOTT';

TABLE_NAME	COLUMN_NAM NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
--------------- ---------- ------------ ---------- ---------- ---------------
T1		ID		     10 	 0	   .1 NONE


SCOTT@TNS_PDB01>select count(*) from t1 where id = 10;

  COUNT(*)
----------
     16293

SCOTT@TNS_PDB01>select * from table(dbms_xplan.display_cursor(sql_id => null,cursor_child_no => null,format => 'advanced'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------
SQL_ID	5gz7rn8s8kfyy, child number 0
-------------------------------------
select count(*) from t1 where id = 10

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation	   | Name | Rows  | Bytes | Cost (%CPU)| Time	  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	  |	  |	  |    19 (100)|	  |
|   1 |  SORT AGGREGATE    |	  |	1 |	3 |	       |	  |
|*  2 |   TABLE ACCESS FULL| T1   |  4000 | 12000 |    19   (0)| 00:00:01 |
---------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   2 - SEL$1 / T1@SEL$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("ID"=10)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=0) COUNT(*)[22]
   2 - (rowset=1019)

Query Block Registry:
---------------------

  <q o="2" f="y"><n><![CDATA[SEL$1]]></n><f><h><t><![CDATA[T1]]></t><s><![
	CDATA[SEL$1]]></s></h></f></q>



52 rows selected.

SCOTT@TNS_PDB01>select num_rows from dba_tab_statistics where table_name = 'T1' and owner = 'SCOTT';

  NUM_ROWS
----------
     40000

SCOTT@TNS_PDB01>select num_distinct,num_nulls from dba_tab_col_statistics where table_name = 'T1' and owner = 'SCOTT';

NUM_DISTINCT  NUM_NULLS
------------ ----------
	  10	      0

SCOTT@TNS_PDB01>select 1/10*40000 from dual;

1/10*40000
----------
      4000


**可以看出执行计划中的Cardinality和实际的值不相等,是不准确的。**

3、收集表T1列ID的直方图信息

SCOTT@TNS_PDB01>exec dbms_stats.gather_table_stats(ownname => 'SCOTT',tabname => 'T1',estimate_percent => 100,method_opt => 'FOR COLUMNS ID SIZE AUTO',cascade => true,no_invalidate => false);

PL/SQL procedure successfully completed.

SCOTT@TNS_PDB01>select table_name,column_name,num_distinct,num_nulls,histogram from dba_tab_col_statistics where table_name = 'T1' and owner = 'SCOTT';

TABLE_NAME	COLUMN_NAM NUM_DISTINCT  NUM_NULLS HISTOGRAM
--------------- ---------- ------------ ---------- ---------------
T1		ID		     10 	 0 FREQUENCY

SCOTT@TNS_PDB01>select count(*) from t1 where id = 10;

  COUNT(*)
----------
     16293

SCOTT@TNS_PDB01>select * from table(dbms_xplan.display_cursor(sql_id => null,cursor_child_no => null,format => 'advanced'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------
SQL_ID	5gz7rn8s8kfyy, child number 1
-------------------------------------
select count(*) from t1 where id = 10

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation	   | Name | Rows  | Bytes | Cost (%CPU)| Time	  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	  |	  |	  |    19 (100)|	  |
|   1 |  SORT AGGREGATE    |	  |	1 |	3 |	       |	  |
|*  2 |   TABLE ACCESS FULL| T1   | 16293 | 48879 |    19   (0)| 00:00:01 |
---------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   2 - SEL$1 / T1@SEL$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("ID"=10)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=0) COUNT(*)[22]
   2 - (rowset=1019)

Query Block Registry:
---------------------

  <q o="2" f="y"><n><![CDATA[SEL$1]]></n><f><h><t><![CDATA[T1]]></t><s><![
	CDATA[SEL$1]]></s></h></f></q>



52 rows selected.


SCOTT@TNS_PDB01>select table_name,column_name,endpoint_number,endpoint_value from dba_tab_histograms where table_name = 'T1' and owner = 'SCOTT';

TABLE_NAME	COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- ---------- --------------- --------------
T1		ID		      3296		1
T1		ID		      3396		3
T1		ID		      4194		5
T1		ID		      8164		7
T1		ID		     24457	       10
T1		ID		     27856	       16
T1		ID		     31507	       27
T1		ID		     35399	       32
T1		ID		     38920	       39
T1		ID		     40000	       49

10 rows selected.

**表DBA_TAB_HISTOGRAM中ENDPOINT_VALUE是表T1中DISTINCT(ID)的值,而ENDPINT_NUMBER是ENDPOINT_VALUE数量的累加值**


SCOTT@TNS_PDB01>select id,count(*) from t1 group by id order by id;

 ID   COUNT(*)
--- ----------
  1	  3296
  3	   100
  5	   798
  7	  3970
 10	 16293
 16	  3399
 27	  3651
 32	  3892
 39	  3521
 49	  1080

10 rows selected.

**有直方图的情况下可选择率的计算 (我们这里谓词条件 = ENDPOITN_VALUE)**

CARDINALITY=NUM_ROWS*SELECTIVITY
SELECTIVITY=BUCKETSIZE/NUM_ROWS
BUCKETSIZE=CURRENT_ENDPOINT_NUMBER – PREVIOUS_ENDPOINT_NUMBER

SCOTT@TNS_PDB01>select (24457-8164)/40000 from dual;

(24457-8164)/40000
------------------
	   .407325

SCOTT@TNS_PDB01>select 40000*.407325 from dual;

40000*.407325
-------------
	16293

总结:可以看出实际计算出来的值和查询出来值以及执行计划中的Cardinality相同,在表中列数据分布不均匀的情况下一定要收集直方图信息来保证优化器走出正确的执行计划。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值