[/PHP]
SQL> select count(1) from diy_userlog where username is not null;
COUNT(1)
----------
22562
SQL> select count(1) from diy_userlog;
COUNT(1)
----------
1963782
diy_userlog中username为空的数据行很多.总共196W行数据中,username不空的总共才只有2W行数据.
这个表的大小是112M
SQL> select trunc(bytes/1024/1024) from user_segments where segment_name=upper('diy_userlog');
TRUNC(BYTES/1024/1024)
----------------------
112
原来就有索引index index1_diy_userlog on diy_userlog(type,username,inputdate),大小是51M;
SQL> select trunc(bytes/1024/1024) from user_segments where segment_name=upper('index1_diy_userlog');
TRUNC(BYTES/1024/1024)
----------------------
51
因为username绝大多数为空,但inputdate基本上都不为空,所以下面创建的复合索引还是很大:44M,所以我立即就删除了.
SQL> create index index3_diy_userlog on diy_userlog(inputdate,username);
Index created
SQL> select trunc(bytes/1024/1024) from user_segments where segment_name=upper('index3_diy_userlog');
TRUNC(BYTES/1024/1024)
----------------------
44
SQL> drop index index3_diy_userlog;
Index dropped
利用索引不索引索引字段全为空的数据行,我创建了下面的复合函数索引,因为username绝大部分都是空的,所以这个索引很小,不到1M
SQL> create index index2_v_diy_userlog on diy_userlog(case when username is not null then inputdate else null end,username);
Index created
SQL> select trunc(bytes/1024/1024) from user_segments where segment_name=upper('index2_v_diy_userlog');
TRUNC(BYTES/1024/1024)
----------------------
0
为了方便使用这里建立的函数索引,建立了如下的视图:
SQL> create view v_diy_userlog as select case when username is not null then inputdate else null end virt_inputdate,username from diy_userlog;
View created
这样
select username,max(inputdate) last_inputdate
from diy_userlog
where username is not null and inputdate is not null
group by username;
我改写成了
select username,max(virt_inputdate) last_inputdate
from v_diy_userlog
where username is not null and virt_inputdate is not null
group by username;
它们是等价的.
我收集了表,索引,包括隐含列上的统计信息,但这个复合的函数索引的使用上存在问题:
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE 10.2.0.1.0 Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production
SQL> show parameter rewrite;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
query_rewrite_enabled string TRUE
query_rewrite_integrity string enforced
SQL> show parameter optimizer_mode
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_mode string ALL_ROWS
SQL> set autotrace traceonly;
SQL> select username,max(inputdate) last_inputdate
from diy_userlog
where username is not null and inputdate is not null
group by username;
2218 rows selected.
SQL> /
2218 rows selected.
Execution Plan
----------------------------------------------------------
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2216 | 22160 | 1127 (4)|
| 1 | HASH GROUP BY | | 2216 | 22160 | 1127 (4)|
| 2 | INDEX FAST FULL SCAN| INDEX1_DIY_USERLOG | 22555 | 220K| 1124 (4)|
---------------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
Statistics
----------------------------------------------------------
6370 consistent gets
2218 rows processed
SQL> select username,max(virt_inputdate) last_inputdate
from v_diy_userlog
where username is not null and virt_inputdate is not null
group by username;
2218 rows selected.
SQL> /
2218 rows selected.
Execution Plan
----------------------------------------------------------
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 245 | 2450 | 1126 (4)|
| 1 | HASH GROUP BY | | 245 | 2450 | 1126 (4)|
| 2 | INDEX FAST FULL SCAN| INDEX1_DIY_USERLOG | 259 | 2590 | 1125 (4)|
---------------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
Statistics
----------------------------------------------------------
6370 consistent gets
2218 rows processed
使用视图的形式,这里根本就不使用这个复合的函数索引.
SQL> select username,max(case when username is not null then inputdate else null end) last_inputdate
from diy_userlog
where username is not null and case when username is not null then inputdate els 3 e null end is not null
group by username;
2218 rows selected.
SQL> /
2218 rows selected.
Execution Plan
----------------------------------------------------------
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 245 | 2450 | 1126 (4)|
| 1 | HASH GROUP BY | | 245 | 2450 | 1126 (4)|
| 2 | INDEX FAST FULL SCAN| INDEX1_DIY_USERLOG | 259 | 2590 | 1125 (4)|
---------------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
Statistics
----------------------------------------------------------
6370 consistent gets
2218 rows processed
使用原表的形式,还是不使用这个复合的函数索引.
SQL> select /*+ index(v_diy_userlog index2_v_diy_userlog) */username,max(virt_inputdate) last_inputdate
2 from v_diy_userlog
3 where username is not null and virt_inputdate is not null
4 group by username;
2218 rows selected.
SQL> /
2218 rows selected.
Execution Plan
----------------------------------------------------------
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 245 | 2450 | 48 (3)|
| 1 | HASH GROUP BY | | 245 | 2450 | 48 (3)|
| 2 | TABLE ACCESS BY INDEX ROWID| DIY_USERLOG | 259 | 2590 | 47 (0)|
| 3 | INDEX FULL SCAN | INDEX2_V_DIY_USERLOG | 3 | | 45 (0)|
------------------------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
Statistics
----------------------------------------------------------
16702 consistent gets
2218 rows processed
使用视图的形式,强制使用这个复合索引,走了索引全扫描,关键是还回访了表,所以导致使用这个更小索引的执行计划比使用更大索引的执行计划的逻辑IO还要大.
SQL> select /*+ index_ffs(v_diy_userlog index2_v_diy_userlog) */username,max(virt_inputdate) last_inputdate
from v_diy_userlog
where username is not null and virt_inputdate is not null
group by username;
2218 rows selected.
SQL> /
2218 rows selected.
Execution Plan
----------------------------------------------------------
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 245 | 2450 | 48 (3)|
| 1 | HASH GROUP BY | | 245 | 2450 | 48 (3)|
| 2 | TABLE ACCESS BY INDEX ROWID| DIY_USERLOG | 259 | 2590 | 47 (0)|
| 3 | INDEX FULL SCAN | INDEX2_V_DIY_USERLOG | 3 | | 45 (0)|
------------------------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
Statistics
----------------------------------------------------------
16702 consistent gets
2218 rows processed
使用视图的形式,强制使用这个复合索引的快速全扫描,但还是走了和上面完全一样的执行计划.
SQL> select /*+ index_ffs(diy_userlog index2_v_diy_userlog) */username,max(case when username is not null then inputdate else null end) last_inputdate
from diy_userlog
where username is not null and case when username is not null then inputdate else null end is not null
group by username;
2218 rows selected.
SQL> /
2218 rows selected.
Execution Plan
----------------------------------------------------------
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 245 | 2450 | 48 (3)|
| 1 | HASH GROUP BY | | 245 | 2450 | 48 (3)|
| 2 | TABLE ACCESS BY INDEX ROWID| DIY_USERLOG | 259 | 2590 | 47 (0)|
| 3 | INDEX FULL SCAN | INDEX2_V_DIY_USERLOG | 3 | | 45 (0)|
------------------------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
Statistics
----------------------------------------------------------
16702 consistent gets
2218 rows processed
使用原表的形式,强制使用这个复合索引的快速全扫描,但还是走了和上面完全一样的执行计划.
SQL> select /*+ index(diy_userlog index2_v_diy_userlog) */username,max(case when username is not null then inputdate else null end) last_inputdate
2 from diy_userlog
3 where username is not null and case when username is not null then inputdate else null end is not null
4 group by username;
2218 rows selected.
SQL> /
2218 rows selected.
Execution Plan
----------------------------------------------------------
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 245 | 2450 | 48 (3)|
| 1 | HASH GROUP BY | | 245 | 2450 | 48 (3)|
| 2 | TABLE ACCESS BY INDEX ROWID| DIY_USERLOG | 259 | 2590 | 47 (0)|
| 3 | INDEX FULL SCAN | INDEX2_V_DIY_USERLOG | 3 | | 45 (0)|
------------------------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
Statistics
----------------------------------------------------------
16705 consistent gets
2218 rows processed
使用原表的形式,强制使用这个复合索引,但还是走了和上面完全一样的执行计划.
trusted形式呢?按说不用设置成这个的,病急乱投医吧!
SQL> alter session set query_rewrite_integrity=trusted;
Session altered.
SQL> select /*+ index(v_diy_userlog index2_v_diy_userlog) */username,max(virt_inputdate) last_inputdate
2 from v_diy_userlog
3 where username is not null and virt_inputdate is not null
4 group by username;
2218 rows selected.
SQL> /
2218 rows selected.
Execution Plan
----------------------------------------------------------
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 245 | 2450 | 48 (3)|
| 1 | HASH GROUP BY | | 245 | 2450 | 48 (3)|
| 2 | TABLE ACCESS BY INDEX ROWID| DIY_USERLOG | 259 | 2590 | 47 (0)|
| 3 | INDEX FULL SCAN | INDEX2_V_DIY_USERLOG | 3 | | 45 (0)|
------------------------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
Statistics
----------------------------------------------------------
16705 consistent gets
2218 rows processed
SQL> select /*+ index(diy_userlog index2_v_diy_userlog) */username,max(case when username is not null then inputdate else null end) last_inputdate
2 from diy_userlog
3 where username is not null and case when username is not null then inputdate else null end is not null
4 group by username;
2218 rows selected.
SQL> /
2218 rows selected.
Execution Plan
----------------------------------------------------------
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 245 | 2450 | 48 (3)|
| 1 | HASH GROUP BY | | 245 | 2450 | 48 (3)|
| 2 | TABLE ACCESS BY INDEX ROWID| DIY_USERLOG | 259 | 2590 | 47 (0)|
| 3 | INDEX FULL SCAN | INDEX2_V_DIY_USERLOG | 3 | | 45 (0)|
------------------------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
Statistics
----------------------------------------------------------
16705 consistent gets
2218 rows processed
还是不行呀!
未使用提示时,oracle不使用这个索引可能是因为使用这个索引代价更大(当然从cost看不是这样),明显,它的逻辑读很高.
可问题是:我通过提示强制使用这个索引的时候,它为什么要走索引全扫描,而不是索引快速全扫描呢,特别是它为什么要回访表呢?
这里要使用的所有列我这个复合索引里都有呀!
如果从产品的角度来讲,我觉得这里的username为空的记录行应该是垃圾记录,应该删除掉的.
但如果这些记录确实是需要保留的,
按现在的情况来讲,我这里的解决方案问题出在哪里呢?
或者说这里有什么其他的好的解决方案吗?
[/PHP]