# Matt -- Thanks for the question regarding "density", version 8.1.6

Submitted on 6-Mar-2002 11:50 UTC
Last updated 28-Nov-2005 7:40

What does the density column in dba_tab_columns mean?

## and we said...

Here is a support note on the topic:

Article-ID:         <Note:43041.1>
Circulation:        PUBLISHED (LIMITED)
Folder:             server.Performance.SqlTuning
Topic:              Optimizer Terms
Title:              QTUNE: What is Density?
Document-Type:      BULLETIN
Impact:             MEDIUM
Skill-Level:        NOVICE
Updated-Date:       05-APR-2000 21:20:35
References:         <NOTE:50750.1> Shared-Refs:

Content-Type:       TEXT/PLAIN
Keywords:           CBO; FEATURE;
Products:           5/RDBMS (V7.X to V8.X);
Platforms:          GENERIC;

PURPOSE

To give an explanation of 'Density' as used by the CBO.

SCOPE & APPLICATION

For Oracle Support Analysts trying to understand how density is used by the
CBO when determining the best access path.

Density is a statistic used by the Cost Based Optimizer to give selectivity
estimates for columns where better information is unavailable (i.e. from
histograms etc.).

You must gather statistics on your table to get density statistics.

You can see density statistics by looking at the following views:

USER_TAB_COLUMNS
ALL_TAB_COLUMNS
DBA_TAB_COLUMNS

Density is a column statistic and provides selectivity estimates for
equi-join predicates (e.g. and A.COL1 = B.COL1) and equality predicates
(e.g. COL1 = 5).

The density is expressed as a decimal number between 0 and 1.
Values close to 1 indicate that this column is unselective
Values close to 0 indicate that this column is highly selective

The more selective a column, the less rows are likely to be returned by a
query referencing this column in its predicate list.

The column selectivity is part of the equation used to decide on the best
path for a query to take to retrieve the data required in the most effective
manner and hence impacts the final cost value for the query.

Density is calculated as follows:

Pre 7.3
~~~~~~~

Density = 1 / Number of distinct NON null values

The number of distinct NON-null values for a column (COL1) on table TABLE1
can be obtained as follows:

select distinct count(COL1)
from   TABLE1
where  COL1 is not null;

7.3+
~~~~

The Density calculation has been refined by the use of histograms. If
you have created histograms on your columns we can now use the histogram
information to give more accurate information. Otherwise the Density is
calculated as before. With histograms we can use information on
popular and non-popular values to determine the selectivity.

A non-popular value is one that does not span multiple bucket end points.
A popular value is one that spans multiple end points.

(Refer to <Note:50750.1> for details on histograms)

For non-popular values the density is calculated as the number of non-popular
values divided by the total number of values. Formula:

Density =  Number of non-popular values
----------------------------
total number of values

We only use the density statistic for non-popular values.

Popular values calculate the selectivity of a particular column values by
using histograms as follows:

The Selectivity for popular values is calculated as the number of end points
spanned by that value divided by the total number of end points. Formula:

Selectivity = Number of end points spanned by this value
------------------------------------------
total number of end points

[Top of Page]



Reviews
November 21, 2002 - 8am UTC
Reviewer: Stephan Bressler from germany
Hi,

what's the formula
Density =  Number of non-popular values
----------------------------
total number of values

In case there are no popular values, is the density = 1?

I thought there would be a density per bucket? Wouldn't that make sense?    

Selectivity with bind variables (Oracle 9.2.0.5)   November 26, 2005 - 1am UTC
Tom,

Could you look into this test case:

SQL> create table t1 as select 1 f from all_objects;

Table created.

SQL> create index i_t1 on t1(f);

Index created.

SQL> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size

PL/SQL procedure successfully completed.

SQL> variable f number;
SQL> exec :f := 1;

PL/SQL procedure successfully completed.

SQL> set autotrace on
<b> with constants the plan is nice</b>
SQL> select count(*) from t1 where f=2;

COUNT(*)
----------
0

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
1    0   SORT (AGGREGATE)
2    1     INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=1
Bytes=3)

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
2  consistent gets
0  redo size
490  bytes sent via SQL*Net to client
655  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
1  rows processed

SQL> select count(*) from t1 where f=1;

COUNT(*)
----------
45765

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=3)
1    0   SORT (AGGREGATE)
2    1     TABLE ACCESS (FULL) OF 'T1' (Cost=8 Card=45765 Bytes=137
295)

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
73  consistent gets
0  redo size
493  bytes sent via SQL*Net to client
655  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
1  rows processed

<b>but with bind variable the plan is different</b>
SQL> select count(*) from t1 where f=:f;

COUNT(*)
----------
45765

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
1    0   SORT (AGGREGATE)
2    1     INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=4
58 Bytes=1374)

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
73  consistent gets
0  redo size
493  bytes sent via SQL*Net to client
655  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
1  rows processed

<b>1/NDV doesn't seem to match calculated density </b>
SQL> select object_id from dba_objects where object_name='T1';

OBJECT_ID
----------
56142

SQL> set pages 999
SQL> select * from hist_head$where obj#=56142; OBJ# COL# BUCKET_CNT ROW_CNT CACHE_CNT NULL_CNT TIMESTAMP ---------- ---------- ---------- ---------- ---------- ---------- --------- SAMPLE_SIZE MINIMUM MAXIMUM DISTCNT ----------- ---------- ---------- ---------- LOWVAL ---------------------------------------------------------------- HIVAL DENSITY ---------------------------------------------------------------- ---------- INTCOL# SPARE1 SPARE2 AVGCLN SPARE3 SPARE4 ---------- ---------- ---------- ---------- ---------- ---------- 56142 1 45765 1 1 0 25-NOV-05 45765 1 1 1 C102 C102 .000010925 1 1 2 3 Thank you. - Vladimir  Followup November 26, 2005 - 12pm UTC: autotrace (explain plan) do not BIND PEEK - so they can "lie" To demnonstrate this and the effect of bind variable peeking, I'll run this script: create table t as select 99 id, a.* from all_objects a; update t set id = 1 where rownum = 1; create index t_idx on t(id); begin dbms_stats.gather_table_stats ( user, 'T', method_opt=>'for all indexed columns size 254', cascade=>TRUE ); end; / variable n number alter session set sql_trace=true; exec :n := 1 select count(object_type) from t n_is_1_first where id = :n; exec :n := 99 select count(object_type) from t n_is_1_first where id = :n; exec :n := 99 select count(object_type) from t n_is_99_first where id = :n; exec :n := 1 select count(object_type) from t n_is_99_first where id = :n; The tkprof was generated with: tkprof tracefile outputfile aggregate=no explain=u/p to get both an EXPLAIN PLAN as well as getting all four statements printed individually: we start with :N of one, the optimizer will actually optimize this statement as if the literal number one was in the query itself: BEGIN :n := 1; END; ******************************************************************************** select count(object_type) from t n_is_1_first where id = :n Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 70 (OPS$TKYTE)

we did a hard parse (misses = 1), the number one was put in place of :n and the query optimizer
said "use index! it'll only get one row!".  In the following the row source operation plan is the
ACTUAL PLAN used, the EXECUTION PLAN is what explain plan - which DOES NOT BIND PEEK - would give
us.  NOTE THEY ARE DIFFERENT.  Explain plan doesn't have access to the binds:

Rows     Row Source Operation
-------  ---------------------------------------------------
1  SORT AGGREGATE
1   TABLE ACCESS BY INDEX ROWID T
1    INDEX RANGE SCAN T_IDX (object id 36010)

Rows     Execution Plan
-------  ---------------------------------------------------
0  SELECT STATEMENT   GOAL: CHOOSE
1   SORT (AGGREGATE)
1    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 99; END;
********************************************************************************
select count(object_type) from t n_is_1_first where id = :n

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE) This query was soft parsed, the plan already existed. This too will therefore use an index range scan - even though in this case, a full scan would have been "better". Again, explain plan "lies" to us here Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE 30689 TABLE ACCESS BY INDEX ROWID T 30689 INDEX RANGE SCAN T_IDX (object id 36010) Rows Execution Plan ------- --------------------------------------------------- 0 SELECT STATEMENT GOAL: CHOOSE 1 SORT (AGGREGATE) 30689 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T' ******************************************************************************** BEGIN :n := 99; END; ******************************************************************************** select count(object_type) from t n_is_99_first where id = :n Note that this is a hard parse (changed the correlation name in the query, new query). This time - the optimizer plugs 99 into the query and optimizes. We do full scan this time: Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 70 (OPS$TKYTE)

Rows     Row Source Operation
-------  ---------------------------------------------------
1  SORT AGGREGATE
30689   TABLE ACCESS FULL T

Rows     Execution Plan
-------  ---------------------------------------------------
0  SELECT STATEMENT   GOAL: CHOOSE
1   SORT (AGGREGATE)
30689    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 1; END;
********************************************************************************
select count(object_type) from t n_is_99_first where id = :n

but note that this is a soft parse, the full scan plan will be used - first one to run the query
with bind variable inputs (first hard parse of it) gets to to "pick the plan".... So, basically the
same query - just changed the correlation name and we have different plans:

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE) Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE 1 TABLE ACCESS FULL OBJ#(36009) Rows Execution Plan ------- --------------------------------------------------- 0 SELECT STATEMENT GOAL: CHOOSE 1 SORT (AGGREGATE) 1 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T' You would need to use a 10053 trace to see the "true" cardinality in this case  Unexpected cardinality November 26, 2005 - 1pm UTC Reviewer: Jonathan Lewis from UK Test case does not reproduce on my 9.2.0.6 Note that what you have looks like the standard 1% estimate for function(col) = constant. As if Oracle is treating this like a coercion problem. select count(*) from t1 where f=:f; COUNT(*) ---------- 45765 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3) 1 0 SORT (AGGREGATE) 2 1 INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=458 Bytes=1374) card = 458 = ceil(45765/100). My plan gives card={actual number of rows in table). What do you see as the filter_predicates if you run the bind variable version (with a to_number() on the bind) through dbms_xplan ?  Followup November 26, 2005 - 4pm UTC: I didn't even look at the numbers to tell the truth, I saw "bind variable vs literal"..... but yes, I don't see it in 9206 - but in 9iR1: ops$tkyte@ORA9IR1> create table t1 as select 1 f from all_objects;

Table created.

ops$tkyte@ORA9IR1> ops$tkyte@ORA9IR1> create index i_t1 on t1(f);

Index created.

ops$tkyte@ORA9IR1> ops$tkyte@ORA9IR1> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size

PL/SQL procedure successfully completed.

ops$tkyte@ORA9IR1> ops$tkyte@ORA9IR1> variable f number
ops$tkyte@ORA9IR1> set autotrace traceonly explain ops$tkyte@ORA9IR1> select * from t1 where f=1;

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=31020 Bytes=93060)
1    0   TABLE ACCESS (FULL) OF 'T1' (Cost=8 Card=31020 Bytes=93060)

ops$tkyte@ORA9IR1> select * from t1 where f=2; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=3) 1 0 INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=1 Bytes=3) ops$tkyte@ORA9IR1> select * from t1 where f=:f;

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=310 Bytes=930)
1    0   INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=310 Bytes=930)

ops$tkyte@ORA9IR1> set autotrace off it does the 1%, in 9iR2 - 9206 - it did this: ops$tkyte@ORA9IR2> select * from t1 where f=:f;

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=30693 Bytes=92079)
1    0   INDEX (FAST FULL SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=7 Card=30693 Bytes=92079)



Odd cardinality   November 26, 2005 - 2pm UTC
Reviewer: Jonathan Lewis from UK
I forgot to mention - Wolfgang Breitling has mentioned to me an oddity that appears when you create
a histogram on a column with just one value. I think there's an article on his site
www.centrexcc.com which mentions it.


question about the density calculation   November 28, 2005 - 4am UTC
Reviewer: William Moffatt from London, England
Hi Tom,

I'm just curious about something in the support note: it says about the density calculation :

"The number of distinct NON-null values for a column (COL1) on table TABLE1
can be obtained as follows:

select distinct count(COL1)
from   TABLE1
where  COL1 is not null;
"

Firstly, isn't the "not null" implied by the count?

Secondly, shouldn't that be "count(distinct COL1)", rather than "distinct count(COL1)"?

This is just idle curiosity, since the intent is obvious, and I presume the calculation works,
however it's done :)

(I did test it out, on 8.1.7.4, and it does appear that the query as written is incorrect, however
the note does say pre 7.3 - but then again, the "post 7.3" section refers you back to this query
too). I've seen you raise documentation bugs before, and similarly if I've made a mistake
somewhere, I'd rather know what and why :)

BTW, many thanks for the marvelous resource that this site is.

regards,

William


Followup   November 28, 2005 - 7am UTC:

it should just be

select count(distinct col1) from table1;

the query presented in the note would not make sense.

select distinct count(col1) from table1 where col1 is not null;

is the same as

select count(col1) from table1;

which would count non-null col1 values - they meant count(distinct col1) 

Calculation of density   July 30, 2008 - 6am UTC
Reviewer: Mette from DK
Thanks to this entry I now understand how we get the estimated number of rows when using histograms
- thanks.

But I dont understand how Oracle calculates the density (used for the non popular values).

I'm on 10.2.0.3 on AIX

This is the situation:

250 buckets
23644 distinct values
34 popular (spanning more than one bucket)
1.799.977 rows
stats run with 100%

The density is 0.002267921.... (taken from TAB_COLUMNS)

How does he get that number?

I tried with the formula from above - and got nothing like this number.

215 / 23644 = 0.009093216

What have I missed here?

Best regards