直方图 density（密度）解释

最新推荐文章于 2025-03-23 00:15:00 发布

长烟慢慢

最新推荐文章于 2025-03-23 00:15:00 发布

阅读量1.6w

点赞数

分类专栏：被引用

被引用专栏收录该内容

40 篇文章

订阅专栏

本文详细解释了Oracle数据库中“密度”这一统计指标的意义及其计算方法。密度用于估算查询选择性，指导查询优化器选择最佳访问路径。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

引用源：http://blog.csdn.net/cymm_liu/article/details/8846141

Matt -- Thanks for the question regarding "density", version 8.1.6

Submitted on 6-Mar-2002 11:50 UTC

Tom's latest followup | Bookmark | Bottom

Last updated 28-Nov-2005 7:40

You Asked

What does the density column in dba_tab_columns mean?

and we said...

Here is a support note on the topic:


Article-ID:         <Note:43041.1>
Circulation:        PUBLISHED (LIMITED)
Folder:             server.Performance.SqlTuning
Topic:              Optimizer Terms
Title:              QTUNE: What is Density?
Document-Type:      BULLETIN
Impact:             MEDIUM
Skill-Level:        NOVICE
Updated-Date:       05-APR-2000 21:20:35
References:         <NOTE:50750.1> Shared-Refs:        

Content-Type:       TEXT/PLAIN
Keywords:           CBO; FEATURE; 
Products:           5/RDBMS (V7.X to V8.X); 
Platforms:          GENERIC;  

PURPOSE

To give an explanation of 'Density' as used by the CBO.

SCOPE & APPLICATION

For Oracle Support Analysts trying to understand how density is used by the
CBO when determining the best access path.

Density is a statistic used by the Cost Based Optimizer to give selectivity
estimates for columns where better information is unavailable (i.e. from 
histograms etc.). 

You must gather statistics on your table to get density statistics. 

You can see density statistics by looking at the following views:

USER_TAB_COLUMNS
ALL_TAB_COLUMNS
DBA_TAB_COLUMNS

Density is a column statistic and provides selectivity estimates for 
equi-join predicates (e.g. and A.COL1 = B.COL1) and equality predicates 
(e.g. COL1 = 5).

The density is expressed as a decimal number between 0 and 1. 
Values close to 1 indicate that this column is unselective
Values close to 0 indicate that this column is highly selective

The more selective a column, the less rows are likely to be returned by a 
query referencing this column in its predicate list. 

The column selectivity is part of the equation used to decide on the best 
path for a query to take to retrieve the data required in the most effective 
manner and hence impacts the final cost value for the query.

Density is calculated as follows:

Pre 7.3
~~~~~~~

    Density = 1 / Number of distinct NON null values

The number of distinct NON-null values for a column (COL1) on table TABLE1 
can be obtained as follows:

select distinct count(COL1) 
from   TABLE1
where  COL1 is not null;


7.3+
~~~~

The Density calculation has been refined by the use of histograms. If 
you have created histograms on your columns we can now use the histogram
information to give more accurate information. Otherwise the Density is 
calculated as before. With histograms we can use information on
popular and non-popular values to determine the selectivity. 

A non-popular value is one that does not span multiple bucket end points.
A popular value is one that spans multiple end points.

(Refer to <Note:50750.1> for details on histograms)

For non-popular values the density is calculated as the number of non-popular 
values divided by the total number of values. Formula:

    Density =  Number of non-popular values
               ----------------------------
                  total number of values

We only use the density statistic for non-popular values. 

Popular values calculate the selectivity of a particular column values by 
using histograms as follows:

The Selectivity for popular values is calculated as the number of end points 
spanned by that value divided by the total number of end points. Formula:

    Selectivity = Number of end points spanned by this value
                  ------------------------------------------
                         total number of end points

           [Top of Page]

Reviews

November 21, 2002 - 8am UTC

Bookmark | Bottom | Top

Reviewer: Stephan Bressler from germany

Hi,

what's the formula
Density =  Number of non-popular values
               ----------------------------
                  total number of values

In case there are no popular values, is the density = 1?

I thought there would be a density per bucket? Wouldn't that make sense?

Selectivity with bind variables (Oracle 9.2.0.5) November 26, 2005 - 1am UTC

Bookmark | Bottom | Top

Reviewer: Vladimir Sadilovskiy from MA, US

Tom,

Could you look into this test case:

SQL> create table t1 as select 1 f from all_objects;

Table created.

SQL> create index i_t1 on t1(f);

Index created.

SQL> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size 
254',estimate_percent=>100, cascade=>true);

PL/SQL procedure successfully completed.

SQL> variable f number;
SQL> exec :f := 1;

PL/SQL procedure successfully completed.

SQL> set autotrace on
<b> with constants the plan is nice</b>
SQL> select count(*) from t1 where f=2;

  COUNT(*)
----------
         0


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
   1    0   SORT (AGGREGATE)
   2    1     INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=1
           Bytes=3)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        490  bytes sent via SQL*Net to client
        655  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select count(*) from t1 where f=1;

  COUNT(*)
----------
     45765


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=3)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'T1' (Cost=8 Card=45765 Bytes=137
          295)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         73  consistent gets
          0  physical reads
          0  redo size
        493  bytes sent via SQL*Net to client
        655  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

<b>but with bind variable the plan is different</b>
SQL> select count(*) from t1 where f=:f;

  COUNT(*)
----------
     45765


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
   1    0   SORT (AGGREGATE)
   2    1     INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=4
          58 Bytes=1374)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         73  consistent gets
          0  physical reads
          0  redo size
        493  bytes sent via SQL*Net to client
        655  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

<b>1/NDV doesn't seem to match calculated density </b>
SQL> select object_id from dba_objects where object_name='T1';

 OBJECT_ID
----------
     56142

SQL> set pages 999
SQL> select * from hist_head$ where obj#=56142;

      OBJ#       COL# BUCKET_CNT    ROW_CNT  CACHE_CNT   NULL_CNT TIMESTAMP
---------- ---------- ---------- ---------- ---------- ---------- ---------
SAMPLE_SIZE    MINIMUM    MAXIMUM    DISTCNT
----------- ---------- ---------- ----------
LOWVAL
----------------------------------------------------------------
HIVAL                                                               DENSITY
---------------------------------------------------------------- ----------
   INTCOL#     SPARE1     SPARE2     AVGCLN     SPARE3     SPARE4
---------- ---------- ---------- ---------- ---------- ----------
     56142          1      45765          1          1          0 25-NOV-05
      45765          1          1          1
C102
C102                                                             .000010925
         1          1          2          3

Thank you.

- Vladimir

Followup November 26, 2005 - 12pm UTC:

autotrace (explain plan) do not BIND PEEK - so they can "lie"

To demnonstrate this and the effect of bind variable peeking, I'll run this script:


create table t
as
select 99 id, a.* from all_objects a;

update t set id = 1 where rownum = 1;

create index t_idx on t(id);
begin
   dbms_stats.gather_table_stats
   ( user, 'T',
     method_opt=>'for all indexed columns size 254',
     cascade=>TRUE );
end;
/
variable n number
alter session set sql_trace=true;
exec :n := 1
select count(object_type) from t n_is_1_first where id = :n;
exec :n := 99
select count(object_type) from t n_is_1_first where id = :n;
exec :n := 99
select count(object_type) from t n_is_99_first where id = :n;
exec :n := 1
select count(object_type) from t n_is_99_first where id = :n;



The tkprof was generated with:

tkprof tracefile outputfile aggregate=no explain=u/p

to get both an EXPLAIN PLAN as well as getting all four statements printed individually:

we start with :N of one, the optimizer will actually optimize this statement as if the literal 
number one was in the query itself:


BEGIN :n := 1; END;
********************************************************************************
select count(object_type) from t n_is_1_first where id = :n

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

we did a hard parse (misses = 1), the number one was put in place of :n and the query optimizer 
said "use index! it'll only get one row!".  In the following the row source operation plan is the 
ACTUAL PLAN used, the EXECUTION PLAN is what explain plan - which DOES NOT BIND PEEK - would give 
us.  NOTE THEY ARE DIFFERENT.  Explain plan doesn't have access to the binds:

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
      1   TABLE ACCESS BY INDEX ROWID T
      1    INDEX RANGE SCAN T_IDX (object id 36010)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
      1    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 99; END;
********************************************************************************
select count(object_type) from t n_is_1_first where id = :n

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

This query was soft parsed, the plan already existed.  This too will therefore use an index range 
scan - even though in this case, a full scan would have been "better".  Again, explain plan "lies" 
to us here


Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
  30689   TABLE ACCESS BY INDEX ROWID T
  30689    INDEX RANGE SCAN T_IDX (object id 36010)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
  30689    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 99; END;
********************************************************************************
select count(object_type) from t n_is_99_first where id = :n

Note that this is a hard parse (changed the correlation name in the query, new query).  This time 
- the optimizer plugs 99 into the query and optimizes.  We do full scan this time:

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
  30689   TABLE ACCESS FULL T


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
  30689    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 1; END;
********************************************************************************
select count(object_type) from t n_is_99_first where id = :n

but note that this is a soft parse, the full scan plan will be used - first one to run the query 
with bind variable inputs (first hard parse of it) gets to to "pick the plan".... So, basically the 
same query - just changed the correlation name and we have different plans:


Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
      1   TABLE ACCESS FULL OBJ#(36009)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
      1    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'


You would need to use a 10053 trace to see the "true" cardinality in this case

Unexpected cardinality November 26, 2005 - 1pm UTC

Bookmark | Bottom | Top

Reviewer: Jonathan Lewis from UK

Test case does not reproduce on my 9.2.0.6


Note that what you have looks like the standard 1% estimate for function(col) = constant.  As if 
Oracle is treating this like a coercion problem.


select count(*) from t1 where f=:f;

  COUNT(*)
----------
     45765

Execution Plan
----------------------------------------------------------
0   SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
1 0   SORT (AGGREGATE)
2 1     INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=458 Bytes=1374)

    card = 458 = ceil(45765/100).


My plan gives card={actual number of rows in table).

What do you see as the filter_predicates if you run the bind variable version (with a to_number() 
on the bind) through dbms_xplan ?

Followup November 26, 2005 - 4pm UTC:

I didn't even look at the numbers to tell the truth, I saw "bind variable vs literal".....


but yes, I don't see it in 9206 - but in 9iR1:

ops$tkyte@ORA9IR1> create table t1 as select 1 f from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> create index i_t1 on t1(f);
 
Index created.
 
ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size 
254',estimate_percent=>99.99, cascade=>true);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> variable f number
ops$tkyte@ORA9IR1> set autotrace traceonly explain
ops$tkyte@ORA9IR1> select * from t1 where f=1;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=31020 Bytes=93060)
   1    0   TABLE ACCESS (FULL) OF 'T1' (Cost=8 Card=31020 Bytes=93060)
 
 
 
ops$tkyte@ORA9IR1> select * from t1 where f=2;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=3)
   1    0   INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=1 Bytes=3)
 
 
 
ops$tkyte@ORA9IR1> select * from t1 where f=:f;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=310 Bytes=930)
   1    0   INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=310 Bytes=930)
 
 
 
ops$tkyte@ORA9IR1> set autotrace off


it does the 1%, in 9iR2 - 9206 - it did this:


ops$tkyte@ORA9IR2> select * from t1 where f=:f;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=30693 Bytes=92079)
   1    0   INDEX (FAST FULL SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=7 Card=30693 Bytes=92079)

Odd cardinality November 26, 2005 - 2pm UTC

Bookmark | Bottom | Top

Reviewer: Jonathan Lewis from UK

I forgot to mention - Wolfgang Breitling has mentioned to me an oddity that appears when you create 
a histogram on a column with just one value. I think there's an article on his site 
www.centrexcc.com which mentions it.

question about the density calculation November 28, 2005 - 4am UTC

Bookmark | Bottom | Top

Reviewer: William Moffatt from London, England

Hi Tom,

I'm just curious about something in the support note: it says about the density calculation :

"The number of distinct NON-null values for a column (COL1) on table TABLE1 
can be obtained as follows:

select distinct count(COL1) 
from   TABLE1
where  COL1 is not null;
"

Firstly, isn't the "not null" implied by the count?

Secondly, shouldn't that be "count(distinct COL1)", rather than "distinct count(COL1)"?

This is just idle curiosity, since the intent is obvious, and I presume the calculation works, 
however it's done :)

(I did test it out, on 8.1.7.4, and it does appear that the query as written is incorrect, however 
the note does say pre 7.3 - but then again, the "post 7.3" section refers you back to this query 
too). I've seen you raise documentation bugs before, and similarly if I've made a mistake 
somewhere, I'd rather know what and why :)

BTW, many thanks for the marvelous resource that this site is.

regards,

William

Followup November 28, 2005 - 7am UTC:

it should just be 

select count(distinct col1) from table1;


the query presented in the note would not make sense.  

select distinct count(col1) from table1 where col1 is not null;

is the same as

select count(col1) from table1;

which would count non-null col1 values - they meant count(distinct col1)

Calculation of density July 30, 2008 - 6am UTC

Bookmark | Bottom | Top

Reviewer: Mette from DK

Thanks to this entry I now understand how we get the estimated number of rows when using histograms 
- thanks.

But I dont understand how Oracle calculates the density (used for the non popular values).

I'm on 10.2.0.3 on AIX

This is the situation:

250 buckets
23644 distinct values
34 popular (spanning more than one bucket)
1.799.977 rows
stats run with 100%

The density is 0.002267921.... (taken from TAB_COLUMNS)

How does he get that number?

I tried with the formula from above - and got nothing like this number.

215 / 23644 = 0.009093216

What have I missed here?

Best regards