本文翻译自mysql性能调优及架构设计 优化部分。主要是做个记录和增加学习的广度。
实现算法:
Nested Loop Join。本质上和我们平时使用时的直观感受一致,选定一张驱动表作为循环基础数据,然后循环该结果集中的数据作为过滤条件到下一张表内查询数据,然后合并结果。如果有多个join,则将前面的结果集作为循环条件,到后一个表中查询数据。
书中的例子:
select m.subject msg_subject, c.content msg_content
from user_group g,group_message m,group_message_content c
where g.user_id = 1
and m.group_id = g.group_id
and c.group_msg_id = m.id
userid,group_id,group_msg_id都存在索引。
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: g
type: ref
possible_keys: user_group_gid_ind,user_group_uid_ind,user_group_gid_uid_ind
key: user_group_uid_ind
key_len: 4
ref: const
rows: 2
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: m
type: ref
possible_keys: PRIMARY,idx_group_message_gid_uid
key: idx_group_message_gid_uid
key_len: 4
ref: g.group_id
rows: 3
Extra:
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: c
type: ref
possible_keys: idx_group_message_content_msg_id
key: idx_group_message_content_msg_id
key_len: 4
ref: m.id
rows: 2
Extra:
伪代码如下:
for each record g_rec in table user_group that g_rec.user_id=1{
for each record m_rec in group_message that m_rec.group_id=g_rec.group_id{
for each record c_rec in group_message_content that c_rec.group_msg_id=m_rec.id
pass the (g_rec.user_id, m_rec.subject, c_rec.content) row
combination to output;
}
}
如果去掉group_msg_id字段的索引,执行计划就不一样了。
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: g
type: ref
possible_keys: user_group_uid_ind
key: user_group_uid_ind
key_len: 4
ref: const
rows: 2
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: m
type: ref
possible_keys: PRIMARY,idx_group_message_gid_uid
key: idx_group_message_gid_uid
key_len: 4
ref: g.group_id
rows: 3
Extra:
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: c
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 96
Extra:Using where;Using join buffer
最后一张表的索引删除了,就只能全表扫描,因此使用到了join buffer,其实也就是缓存。注意到,前两张表的数据放到了缓存内,然后通过这个缓存再和第三张表进行查询。这是因为第三张表需要全表扫描了,这时候没有必要每一次都关联。。。
for each record g_rec in table user_group{
for each record m_rec in group_message that m_rec.group_id=g_rec.group_id{
put (g_rec, m_rec) into the buffer
if (buffer is full)
flush_buffer();
}
}
flush_buffer(){
for each record c_rec in group_message_content that c_rec.group_msg_id = c_rec.id{
for each record in the buffer
pass (g_rec.user_id, m_rec.subject, c_rec.content) row combination to output;
}
empty the buffer;
}
join语句的优化:
1.小结果集驱动大结果集
2.优先优化内层循环。
3.对join字段建立索引。
4.当无法建立索引时,需要注意join Buffer Size(暂时不清楚,没遇到具体场景)