当用like‘%%’查询的时候,普通的索引是失效的,这种场景是可以通过建立全文索引的扫描而快速得到查询结果,这是它的优点。下面来做一个实验,我的环境上有一张表,有一个字段存放的是用户地址的信息:
SQL> set autotrace traceonly
SQL> set timing on
SQL> select DISTINCT(C.NAME) from test C where C.Usaddr like '%下梅林%';
已选择39行。
已用时间: 00: 00: 05.50
执行计划
----------------------------------------------------------
Plan hash value: 2203132549
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5611 | 273K| | 15989 (2)| 00:03:12 |
| 1 | HASH UNIQUE | | 5611 | 273K| 7968K| 15989 (2)| 00:03:12 |
|* 2 | TABLE ACCESS FULL| TEST | 129K| 6322K| | 15364 (2)| 00:03:05 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C"."USADDR" LIKE '%下梅林%')
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
68689 consistent gets
40041 physical reads
0 redo size
1258 bytes sent via SQL*Net to client
407 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
39 rows processed
---换一种写法看能不能奏效。
SQL> select DISTINCT( C.NAME) from test C where instr(C.Usaddr,'下梅林')>0;
已选择39行。
已用时间: 00: 00: 05.50
执行计划
----------------------------------------------------------
Plan hash value: 2203132549
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5611 | 273K| | 16012 (2)| 00:03:13 |
| 1 | HASH UNIQUE | | 5611 | 273K| 7968K| 16012 (2)| 00:03:13 |
|* 2 | TABLE ACCESS FULL| TEST | 129K| 6322K| | 15386 (2)| 00:03:05 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(INSTR("C"."USADDR",'下梅林')>0)
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
68689 consistent gets
40000 physical reads
0 redo size
1258 bytes sent via SQL*Net to client
407 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
39 rows processed
---建立全文索引,按照全文索引的语法写。
SQL> create index INDEX_USADDR on test(Usaddr) indextype is ctxsys.context;
SQL> select DISTINCT( C.NAME) from test c where contains( C.Usaddr,'下梅林')>0;
已选择39行。
已用时间: 00: 00: 00.03
执行计划
----------------------------------------------------------
Plan hash value: 2636393893
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1157 | 71734 | 5 (20)| 00:00:01 |
| 1 | HASH UNIQUE | | 1157 | 71734 | 5 (20)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST | 1295 | 80290 | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | INDEX_USADDR | | | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("CTXSYS"."CONTAINS"("C"."USADDR",'下梅林')>0)
统计信息
----------------------------------------------------------
13 recursive calls
0 db block gets
1806 consistent gets
1 physical reads
0 redo size
1258 bytes sent via SQL*Net to client
407 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
39 rows processed
SQL> select bytes/1024/1024 from user_segments s where s.segment_name='TEST';
BYTES/1024/1024
---------------
546
SQL> select sum(bytes/1024/1024) from user_segments s where s.segment_name like '%INDEX_USADDR%';
SUM(BYTES/1024/1024)
--------------------
158.0625
SQL> select segment_name from user_segments s where s.segment_name like '%INDEX_USADDR%';
SEGMENT_NAME
---------------------------------------------------------------------------------
DR$INDEX_USADDR$I
DR$INDEX_USADDR$R
DR$INDEX_USADDR$X
全文索引的缺点:
1.全文索引导致磁盘资源的大量占用,如上述的例子中,表占用的空间为546M,然而单个字段的索引就占了158M。全文索引本身就是一个利用磁盘空间换取性能的方法。全文索引大的原因是,按照某种语言来进行分词。
2.更新字段值,全文索引的索引不会自动更新,索引定期维护,以及表本身的维护操作使得这个表的管理成本大大的增加。
3.使用全文索引并不是对应用透明的。如果要想利用全文索引,必须修改查询语句。原有的查询语句是不可能利用全文索引的,需要改成全文索引规定的语法。全文索引不会影响到其他的SQL语句。
4.全文索引自身还有些缺陷(我想是用的少的缘故)。
建议使用场景:
表的数据比较固定(没有DML操作),不是核心的功能。