http://jonathanlewis.wordpress.com/2010/09/20/frequency-histograms-2/
原文请自备梯子。
Frequency histogram的概念请看官方文档:
http://docs.oracle.com/cd/E11882_01/server.112/e41573/stats.htm#i41591
Jonathan在这篇文章里介绍了怎么去解读数字型字段的Frequency Histogram。下面给出源代码:
create table t1(n1 number not null);
--insert 100 rows, 10 values (10,20...100)
insert into t1
select 10* round(dbms_random.value(1,10)) from dual connect by level <= 100;
commit;
select n1, count(*) from t1 group by n1 order by n1;
N1 COUNT(*)
---------- ----------
10 9
20 12
30 16
40 11
50 11
60 8
70 7
80 12
90 8
100 6
--Gathering statistics, size 254 guarantees the creation of frequency histogram
exec dbms_stats.gather_table_stats(user, 'T1', method_opt=>'FOR COLUMNS N1 SIZE 254',cascade=>true);
--Viewing histogram info
select *
from user_tab_histograms
where table_name='T1' and column_name='N1'
order by endpoint_number;
--Getting the occurancy of each value.
select
endpoint_value column_value,
endpoint_number - nvl(prev_endpoint,0) frequency
from (
select
endpoint_number,
lag(endpoint_number,1) over(order by endpoint_number) prev_endpoint,
endpoint_value
from user_tab_histograms
where table_name = 'T1' and column_name = 'N1'
)
order by endpoint_number;
COLUMN_VALUE FREQUENCY
------------ ----------
10 9
20 12
30 16
40 11
50 11
60 8
70 7
80 12
90 8
100 6
提示:
endpoint value - 字段的值
endpoint number - 累加型值重复次数。oracle保证顺序性。
为何Frequency Histogram的数据会失真?
http://jonathanlewis.wordpress.com/2010/09/24/frequency-histogram-3/
这篇文章中,Jonathan介绍了一个原因。收集统计数据时取样大小。
Date型字段的Frequency Histogram的分析代码:
create table t2(d1 date not null);
insert into t2
select sysdate - round(dbms_random.value(1,10)) from dual connect by level <= 10;
commit;
exec dbms_stats.gather_table_stats(user, 'T2', method_opt=>'FOR COLUMNS D1 SIZE 254',cascade=>true);
select d1, count(*) from t2
group by d1
order by d1;
D1 COUNT(*)
--------- ----------
17-DEC-13 2
19-DEC-13 1
20-DEC-13 2
21-DEC-13 1
22-DEC-13 3
23-DEC-13 1
select ep_value column_value, endpoint_number - nvl(prev_endpoint,0) frequency
from (
select endpoint_number,
lag(endpoint_number,1) over(order by endpoint_number) prev_endpoint,
to_date(to_char(endpoint_value,'FM99999999') || '.' || to_char(86400 * mod(endpoint_value,1),'FM99999'),'J.sssss') ep_value
from user_tab_histograms
where table_name = 'T2' and column_name = 'D1'
)
order by endpoint_number;
COLUMN_VALUE FREQUENCY
------------ ----------
17-DEC-13 2
19-DEC-13 1
20-DEC-13 2
21-DEC-13 1
22-DEC-13 3
23-DEC-13 1
DATE类型在内部存储为数字类型,小数点后是小时分秒部分。故而需要将其解析成可读的类型。