Db2两个表做join时, where条件中如果使用了or运算符,并且or前后不是同一张表的话,那么就不会使用到指定列上的索引(无论索引的筛选性多好)。
举一个简单的例子,有A,B两张表, A表存放ID和真名, B表存放ID和昵称,表结构分别如下:
SQL> create table a(id int, real_name char(15))
SQL> create table b(id int ,nick_name char(15))
现在想通过一条SQL查询real_name是"name333"或者nick_name是"nick1024"的用户的信息,SQL语句可能如下:
select ... from a left join b on a.id = b.id where real_name = 'name333' or nick_name = 'nick1024'
这种情况下,where条件里有个or,并且real_name和nick_name不在同一个表,那么上面的SQL就不会使用real_name或者nick_name上面的索引,导致SQL性能变差。
下面来验证一下:
1. 准备一些数据
$ cat a.py
for i in range(1,5000000):
print '%d,name%d' % (i, i)
$ cat b.py
for i in range(1,5000000):
print '%d,nick%d' % (i, i)
$ python a.py > a.del
$ python b.py > b.del
2. 创建表A,B,每个列上都创建好索引,之后load第1步中生成的数据
db2 => create table a(id int, real_name char(15))
db2 => create table b(id int ,nick_name char(15))
db2 => load from a.del of del insert into a
db2 => load from b.del of del insert into b
db2 => create index a_id on a(id)
db2 => create index b_id on b(id)
db2 => create index real_name_idx on a(real_name)
db2 => create index nick_name_idx on b(nick_name)
db2 => runstats on table a on all columns with distribution and indexes all
db2 => runstats on table b on all columns with distribution and indexes all
3. 准备3条join SQL语句:
第1条SQL where条件使用or,并且or前后不是同一个表, 我们期望能使用到real_name和nick_name上的索引
$ cat 1.sql
select
a.id,
a.real_name,
b.nick_name
from
a left join b
on
a.id = b.id
where
a.real_name = 'name333'
or
b.nick_name = 'nick1024'
@
第2条SQL where条件使用and(这条SQL和SQL1功能不一样):
$ cat 2.sql
select
a.id,
a.real_name,
b.nick_name
from
a left join b
on
a.id = b.id
where
a.real_name = 'name333'
and
b.nick_name = 'nick333'
@
第3条SQL是第一条SQL的改写,将or改写为union:
$ cat 3.sql
select
a.id,
a.real_name,
b.nick_name
from
a left join b
on
a.id = b.id
where
a.real_name = 'name333'
union
select
a.id,
a.real_name,
b.nick_name
from
a left join b
on
a.id = b.id
where
b.nick_name = 'nick1024'
@
4. 分别运行3条SQL语句,看看各自用了多久,可以看到第1条SQL运行时间特别长
$ time db2 -td@ -vf 1.sql
select a.id, a.real_name, b.nick_name from a left join b on a.id = b.id where a.real_name = 'name333' or b.nick_name = 'nick1024'
ID REAL_NAME NICK_NAME
----------- --------------- ---------------
333 name333 nick333
1024 name1024 nick1024
2 record(s) selected.
real 0m18.245s
user 0m0.024s
sys 0m0.028s
$ time db2 -td@ -vf 2.sql
select a.id, a.real_name, b.nick_name from a left join b on a.id = b.id where a.real_name = 'name333' and b.nick_name = 'nick333'
ID REAL_NAME NICK_NAME
----------- --------------- ---------------
333 name333 nick333
1 record(s) selected.
real 0m0.108s
user 0m0.040s
sys 0m0.064s
$ time db2 -td@ -vf 3.sql
select a.id, a.real_name, b.nick_name from a left join b on a.id = b.id where a.real_name = 'name333' union select a.id, a.real_name, b.nick_name from a left join b on a.id = b.id where b.nick_name = 'nick1024'
ID REAL_NAME NICK_NAME
----------- --------------- ---------------
333 name333 nick333
1024 name1024 nick1024
2 record(s) selected.
real 0m0.054s
user 0m0.020s
sys 0m0.016s
如果分别查看3条SQL的访问计划如下。第1条SQL没有用到任何一个期望的索引,cost特别高:
$ db2expln -d sample -g -f 1.sql -terminator @ -terminal
Optimizer Plan:
Rows
Operator
(ID)
Cost
2
RETURN
( 1)
113791
|
2
FILTER
( 2)
113791
|
5e+06
MSJOIN
( 3)
112864
/ \-----\
5e+06 *
FETCH |
( 4) 5e+06
56047.4 FETCH
/ \ ( 7)
5e+06 5e+06 56047.4
IXSCAN Table: / \
( 5) INST105 5e+06 5e+06
21136.7 A IXSCAN Table:
| ( 8) INST105
5e+06 21136.7 B
Index: |
INST105 5e+06
A_ID Index:
INST105
B_ID
$ db2expln -d sample -g -f 2.sql -terminator @ -terminal
Optimizer Plan:
Rows
Operator
(ID)
Cost
2e-07
RETURN
( 1)
40.6211
|
2e-07
NLJOIN
( 2)
40.6211
/ \-----\
1 *
FETCH / \
( 3) 1 5e+06
27.0755 IXSCAN Table:
/ \ ( 6) INST105
1 5e+06 13.5438 A
IXSCAN Table: |
( 4) INST105 5e+06
20.3083 B Index:
| INST105
5e+06 A_ID
Index:
INST105
NICK_NAME_IDX
$ db2expln -d sample -g -f 3.sql -terminator @ -terminal
Optimizer Plan:
Rows
Operator
(ID)
Cost
1
RETURN
( 1)
81.243
|
1
TBSCAN
( 2)
81.243
|
1
SORT
( 3)
81.2427
|
2
UNION
( 4)
81.242
/------------/ \------------\
1 1
NLJOIN NLJOIN
( 5) (10)
40.621 40.621
/ \------\ / \-----\
1 * 1 *
FETCH / \ FETCH / \
( 6) 1 5e+06 (11) 1 5e+06
27.0755 IXSCAN Table: 27.0755 IXSCAN Table:
/ \ ( 9) INST105 / \ (14) INST105
1 5e+06 13.5438 A 1 5e+06 13.5438 B
IXSCAN Table: | IXSCAN Table: |
( 7) INST105 5e+06 (12) INST105 5e+06
20.3083 B Index: 20.3083 A Index:
| INST105 | INST105
5e+06 A_ID 5e+06 B_ID
Index: Index:
INST105 INST105
NICK_NAME_IDX REAL_NAME_IDX