对于网上很多资料说的BNLJ算法的内表扫描次数:
R*used_column_size/join_buffer_size + 1这个答案我琢磨了很久都觉得不太对,于是乎我去翻看了MySQL的官方文档:对于BNLJ算法的伪代码是这样描述的,我就不做翻译了:
For the example join described previously for the NLJ algorithm (without buffering), the join is done as follows using join buffering:
for each row in t1 matching range {
for each row in t2 matching reference key {
store used columns from t1, t2 in join buffer
if buffer is full {
for each row in t3 {
for each t1, t2 combination in join buffer {
if row satisfies join conditions, send to client
}
}
empty join buffer
}
}
}
if buffer is not empty {
for each row in t3 {
for each t1, t2 combination in join buffer {
if row satisfies join conditions, send to client
}
}
}
If S is the size of each stored t1, t2 combination in the join buffer and C is the number of combinations in the buffer, the number of times table t3 is scanned is:
(S * C)/join_buffer_size + 1
The number of t3 scans decreases as the value of join_buffer_size increases, up to the point when join_buffer_size is large enough to hold all previous row combinations. At that point, no speed is gained by making it larger.
请仔细看算法伪代码,然后再看那个1,我的理解如下:
这个1的意思:假如驱动表只涉及某个单列且总计有10条数据 ,然后join buffer为4,那么10/4=2,2个joinbuffer. 但是2个join buffer只能存下8条数据,所以需要再加一个join buffer,然后就是10/2+1 10其实在这里就是S*C的值 join buffer为2