直方图 density(密度)解释

引用源:http://blog.csdn.net/cymm_liu/article/details/8846141


Matt -- Thanks for the question regarding "density", version 8.1.6

Submitted on 6-Mar-2002 11:50 UTC
Last updated 28-Nov-2005 7:40

You Asked

What does the density column in dba_tab_columns mean? 

and we said...

Here is a support note on the topic:


Article-ID:         <Note:43041.1>
Circulation:        PUBLISHED (LIMITED)
Folder:             server.Performance.SqlTuning
Topic:              Optimizer Terms
Title:              QTUNE: What is Density?
Document-Type:      BULLETIN
Impact:             MEDIUM
Skill-Level:        NOVICE
Updated-Date:       05-APR-2000 21:20:35
References:         <NOTE:50750.1> Shared-Refs:        

Content-Type:       TEXT/PLAIN
Keywords:           CBO; FEATURE; 
Products:           5/RDBMS (V7.X to V8.X); 
Platforms:          GENERIC;  

PURPOSE

To give an explanation of 'Density' as used by the CBO.

SCOPE & APPLICATION

For Oracle Support Analysts trying to understand how density is used by the
CBO when determining the best access path.

Density is a statistic used by the Cost Based Optimizer to give selectivity
estimates for columns where better information is unavailable (i.e. from 
histograms etc.). 

You must gather statistics on your table to get density statistics. 

You can see density statistics by looking at the following views:

USER_TAB_COLUMNS
ALL_TAB_COLUMNS
DBA_TAB_COLUMNS

Density is a column statistic and provides selectivity estimates for 
equi-join predicates (e.g. and A.COL1 = B.COL1) and equality predicates 
(e.g. COL1 = 5).

The density is expressed as a decimal number between 0 and 1. 
Values close to 1 indicate that this column is unselective
Values close to 0 indicate that this column is highly selective

The more selective a column, the less rows are likely to be returned by a 
query referencing this column in its predicate list. 

The column selectivity is part of the equation used to decide on the best 
path for a query to take to retrieve the data required in the most effective 
manner and hence impacts the final cost value for the query.

Density is calculated as follows:

Pre 7.3
~~~~~~~

    Density = 1 / Number of distinct NON null values

The number of distinct NON-null values for a column (COL1) on table TABLE1 
can be obtained as follows:

select distinct count(COL1) 
from   TABLE1
where  COL1 is not null;


7.3+
~~~~

The Density calculation has been refined by the use of histograms. If 
you have created histograms on your columns we can now use the histogram
information to give more accurate information. Otherwise the Density is 
calculated as before. With histograms we can use information on
popular and non-popular values to determine the selectivity. 

A non-popular value is one that does not span multiple bucket end points.
A popular value is one that spans multiple end points.

(Refer to <Note:50750.1> for details on histograms)

For non-popular values the density is calculated as the number of non-popular 
values divided by the total number of values. Formula:

    Density =  Number of non-popular values
               ----------------------------
                  total number of values

We only use the density statistic for non-popular values. 

Popular values calculate the selectivity of a particular column values by 
using histograms as follows:

The Selectivity for popular values is calculated as the number of end points 
spanned by that value divided by the total number of end points. Formula:

    Selectivity = Number of end points spanned by this value
                  ------------------------------------------
                         total number of end points

           [Top of Page] 

 

Reviews
5 stars   November 21, 2002 - 8am UTC
Bookmark |  Bottom |  Top
Reviewer:  Stephan Bressler from germany
Hi,

what's the formula
Density =  Number of non-popular values
               ----------------------------
                  total number of values

In case there are no popular values, is the density = 1?

I thought there would be a density per bucket? Wouldn't that make sense?    


4 stars  Selectivity with bind variables (Oracle 9.2.0.5)   November 26, 2005 - 1am UTC
Bookmark |  Bottom |  Top
Reviewer:  Vladimir Sadilovskiy from MA, US
Tom,

Could you look into this test case:

SQL> create table t1 as select 1 f from all_objects;

Table created.

SQL> create index i_t1 on t1(f);

Index created.

SQL> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size 
254',estimate_percent=>100, cascade=>true);

PL/SQL procedure successfully completed.

SQL> variable f number;
SQL> exec :f := 1;

PL/SQL procedure successfully completed.

SQL> set autotrace on
<b> with constants the plan is nice</b>
SQL> select count(*) from t1 where f=2;

  COUNT(*)
----------
         0


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
   1    0   SORT (AGGREGATE)
   2    1     INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=1
           Bytes=3)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        490  bytes sent via SQL*Net to client
        655  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select count(*) from t1 where f=1;

  COUNT(*)
----------
     45765


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=3)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'T1' (Cost=8 Card=45765 Bytes=137
          295)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         73  consistent gets
          0  physical reads
          0  redo size
        493  bytes sent via SQL*Net to client
        655  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

<b>but with bind variable the plan is different</b>
SQL> select count(*) from t1 where f=:f;

  COUNT(*)
----------
     45765


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
   1    0   SORT (AGGREGATE)
   2    1     INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=4
          58 Bytes=1374)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         73  consistent gets
          0  physical reads
          0  redo size
        493  bytes sent via SQL*Net to client
        655  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

<b>1/NDV doesn't seem to match calculated density </b>
SQL> select object_id from dba_objects where object_name='T1';

 OBJECT_ID
----------
     56142

SQL> set pages 999
SQL> select * from hist_head$ where obj#=56142;

      OBJ#       COL# BUCKET_CNT    ROW_CNT  CACHE_CNT   NULL_CNT TIMESTAMP
---------- ---------- ---------- ---------- ---------- ---------- ---------
SAMPLE_SIZE    MINIMUM    MAXIMUM    DISTCNT
----------- ---------- ---------- ----------
LOWVAL
----------------------------------------------------------------
HIVAL                                                               DENSITY
---------------------------------------------------------------- ----------
   INTCOL#     SPARE1     SPARE2     AVGCLN     SPARE3     SPARE4
---------- ---------- ---------- ---------- ---------- ----------
     56142          1      45765          1          1          0 25-NOV-05
      45765          1          1          1
C102
C102                                                             .000010925
         1          1          2          3

Thank you.

- Vladimir 


Followup   November 26, 2005 - 12pm UTC:

autotrace (explain plan) do not BIND PEEK - so they can "lie"

To demnonstrate this and the effect of bind variable peeking, I'll run this script:


create table t
as
select 99 id, a.* from all_objects a;

update t set id = 1 where rownum = 1;

create index t_idx on t(id);
begin
   dbms_stats.gather_table_stats
   ( user, 'T',
     method_opt=>'for all indexed columns size 254',
     cascade=>TRUE );
end;
/
variable n number
alter session set sql_trace=true;
exec :n := 1
select count(object_type) from t n_is_1_first where id = :n;
exec :n := 99
select count(object_type) from t n_is_1_first where id = :n;
exec :n := 99
select count(object_type) from t n_is_99_first where id = :n;
exec :n := 1
select count(object_type) from t n_is_99_first where id = :n;



The tkprof was generated with:

tkprof tracefile outputfile aggregate=no explain=u/p

to get both an EXPLAIN PLAN as well as getting all four statements printed individually:

we start with :N of one, the optimizer will actually optimize this statement as if the literal 
number one was in the query itself:


BEGIN :n := 1; END;
********************************************************************************
select count(object_type) from t n_is_1_first where id = :n

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

we did a hard parse (misses = 1), the number one was put in place of :n and the query optimizer 
said "use index! it'll only get one row!".  In the following the row source operation plan is the 
ACTUAL PLAN used, the EXECUTION PLAN is what explain plan - which DOES NOT BIND PEEK - would give 
us.  NOTE THEY ARE DIFFERENT.  Explain plan doesn't have access to the binds:

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
      1   TABLE ACCESS BY INDEX ROWID T
      1    INDEX RANGE SCAN T_IDX (object id 36010)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
      1    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 99; END;
********************************************************************************
select count(object_type) from t n_is_1_first where id = :n

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

This query was soft parsed, the plan already existed.  This too will therefore use an index range 
scan - even though in this case, a full scan would have been "better".  Again, explain plan "lies" 
to us here


Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
  30689   TABLE ACCESS BY INDEX ROWID T
  30689    INDEX RANGE SCAN T_IDX (object id 36010)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
  30689    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 99; END;
********************************************************************************
select count(object_type) from t n_is_99_first where id = :n

Note that this is a hard parse (changed the correlation name in the query, new query).  This time 
- the optimizer plugs 99 into the query and optimizes.  We do full scan this time:

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
  30689   TABLE ACCESS FULL T


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
  30689    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 1; END;
********************************************************************************
select count(object_type) from t n_is_99_first where id = :n

but note that this is a soft parse, the full scan plan will be used - first one to run the query 
with bind variable inputs (first hard parse of it) gets to to "pick the plan".... So, basically the 
same query - just changed the correlation name and we have different plans:


Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
      1   TABLE ACCESS FULL OBJ#(36009)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
      1    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'


You would need to use a 10053 trace to see the "true" cardinality in this case
 

5 stars  Unexpected cardinality   November 26, 2005 - 1pm UTC
Bookmark |  Bottom |  Top
Reviewer:  Jonathan Lewis from UK
Test case does not reproduce on my 9.2.0.6


Note that what you have looks like the standard 1% estimate for function(col) = constant.  As if 
Oracle is treating this like a coercion problem.


select count(*) from t1 where f=:f;

  COUNT(*)
----------
     45765

Execution Plan
----------------------------------------------------------
0   SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
1 0   SORT (AGGREGATE)
2 1     INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=458 Bytes=1374)

    card = 458 = ceil(45765/100).


My plan gives card={actual number of rows in table).

What do you see as the filter_predicates if you run the bind variable version (with a to_number() 
on the bind) through dbms_xplan ?

 


Followup   November 26, 2005 - 4pm UTC:

I didn't even look at the numbers to tell the truth, I saw "bind variable vs literal".....


but yes, I don't see it in 9206 - but in 9iR1:

ops$tkyte@ORA9IR1> create table t1 as select 1 f from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> create index i_t1 on t1(f);
 
Index created.
 
ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size 
254',estimate_percent=>99.99, cascade=>true);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> variable f number
ops$tkyte@ORA9IR1> set autotrace traceonly explain
ops$tkyte@ORA9IR1> select * from t1 where f=1;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=31020 Bytes=93060)
   1    0   TABLE ACCESS (FULL) OF 'T1' (Cost=8 Card=31020 Bytes=93060)
 
 
 
ops$tkyte@ORA9IR1> select * from t1 where f=2;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=3)
   1    0   INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=1 Bytes=3)
 
 
 
ops$tkyte@ORA9IR1> select * from t1 where f=:f;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=310 Bytes=930)
   1    0   INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=310 Bytes=930)
 
 
 
ops$tkyte@ORA9IR1> set autotrace off


it does the 1%, in 9iR2 - 9206 - it did this:


ops$tkyte@ORA9IR2> select * from t1 where f=:f;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=30693 Bytes=92079)
   1    0   INDEX (FAST FULL SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=7 Card=30693 Bytes=92079)



 

5 stars  Odd cardinality   November 26, 2005 - 2pm UTC
Bookmark |  Bottom |  Top
Reviewer:  Jonathan Lewis from UK
I forgot to mention - Wolfgang Breitling has mentioned to me an oddity that appears when you create 
a histogram on a column with just one value. I think there's an article on his site 
www.centrexcc.com which mentions it.
 


5 stars  question about the density calculation   November 28, 2005 - 4am UTC
Bookmark |  Bottom |  Top
Reviewer:  William Moffatt from London, England
Hi Tom,

I'm just curious about something in the support note: it says about the density calculation :

"The number of distinct NON-null values for a column (COL1) on table TABLE1 
can be obtained as follows:

select distinct count(COL1) 
from   TABLE1
where  COL1 is not null;
"

Firstly, isn't the "not null" implied by the count?

Secondly, shouldn't that be "count(distinct COL1)", rather than "distinct count(COL1)"?

This is just idle curiosity, since the intent is obvious, and I presume the calculation works, 
however it's done :)

(I did test it out, on 8.1.7.4, and it does appear that the query as written is incorrect, however 
the note does say pre 7.3 - but then again, the "post 7.3" section refers you back to this query 
too). I've seen you raise documentation bugs before, and similarly if I've made a mistake 
somewhere, I'd rather know what and why :)

BTW, many thanks for the marvelous resource that this site is.

regards,

William
 


Followup   November 28, 2005 - 7am UTC:

it should just be 

select count(distinct col1) from table1;


the query presented in the note would not make sense.  

select distinct count(col1) from table1 where col1 is not null;

is the same as

select count(col1) from table1;

which would count non-null col1 values - they meant count(distinct col1) 

5 stars  Calculation of density   July 30, 2008 - 6am UTC
Bookmark |  Bottom |  Top
Reviewer:  Mette from DK
Thanks to this entry I now understand how we get the estimated number of rows when using histograms 
- thanks.

But I dont understand how Oracle calculates the density (used for the non popular values).

I'm on 10.2.0.3 on AIX

This is the situation:

250 buckets
23644 distinct values
34 popular (spanning more than one bucket)
1.799.977 rows
stats run with 100%

The density is 0.002267921.... (taken from TAB_COLUMNS)

How does he get that number?

I tried with the formula from above - and got nothing like this number.

215 / 23644 = 0.009093216

What have I missed here?

Best regards

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
Python中的直方图密度估计是一种用于估计概率密度函数的非参数方法。它可以通过使用平滑的峰值函数"核"来拟合观察到的数据点,从而对真实的概率分布曲线进行模拟。在绘制直方图密度估计图时,可以使用峰峦图来展示多个数据系列的核密度估计结果。峰峦图显示了X轴对应平均温度的数值范围,Y轴对应不同的月份,每个月份的核密度估计数值映射到颜色,以此来展示多个数据系列的核密度估计结果。在Python中,可以使用joypy包或plotnine包等绘图工具来实现绘制直方图密度估计峰峦图的功能。对于使用joypy包,可以直接使用joyplot()函数来绘制带有颜色渐变映射的核密度估计峰峦图;而对于使用plotnine包,可以结合使用geomlinerange()函数和geomline()函数,并先使用sklearn包的KernelDensity()函数求取每个月份的核密度估计曲线,再根据核密度估计数据绘制峰峦图,从而实现绘制多个数据系列的核密度估计图的功能。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *2* *3* [matlab 三维核密度图_python数据分布型图表直方图系列核密度估计图](https://blog.csdn.net/weixin_39855186/article/details/110258995)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 100%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值