left join 完,行数一定等于左表行数吗? 不一定!!
实验如下:
SELECT * FROM hdd_test1 a LEFT JOIN hdd_test2 b ON a.a_id = b.b_id LIMIT 144
8 rows (134ms)
id | created_at | updated_at | a_id | id | created_at | updated_at | b_id |
---|---|---|---|---|---|---|---|
1 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 1 | 1 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 1 |
2 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 2 | 2 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 2 |
1 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 1 | 3 | 2018-11-05 20:44:00.0 | 2018-11-05 20:44:00.0 | 1 |
1 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 1 | 4 | 2018-11-05 20:44:04.0 | 2018-11-05 20:44:04.0 | 1 |
3 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 3 | ||||
4 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 4 | ||||
5 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 5 | ||||
6 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 6 |
hdd_test1有6条记录, hdd_test2 有4条记录,直接left join ,产生6条记录,别高兴太早,看下面的查询:
SELECT * FROM hdd_test1 a LEFT JOIN hdd_test2 b ON a.a_id = b.b_id WHERE b.b_id > 1 LIMIT 144
1 row (37ms)查看CSV
id | created_at | updated_at | a_id | id | created_at | updated_at | b_id |
---|---|---|---|---|---|---|---|
2 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 2 | 2 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 2 |
如图所示,当where 条件在外,会先聚合后过滤,所以结果会变少
还有一种情况需要注意,left join 结果的行数也可能大于左表:
SELECT * FROM hdd_test1 LIMIT 144
2 rows (33ms)查看CSV
id | created_at | updated_at | a_id |
---|---|---|---|
1 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 1 |
2 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 2 |
SELECT * FROM hdd_test2 LIMIT 144
4 rows (34ms)查看CSV
id | created_at | updated_at | b_id |
---|---|---|---|
1 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 1 |
2 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 2 |
3 | 2018-11-05 20:44:00.0 | 2018-11-05 20:44:00.0 | 1 |
4 | 2018-11-05 20:44:04.0 | 2018-11-05 20:44:04.0 | 1 |
SELECT * FROM hdd_test1 a LEFT JOIN hdd_test2 b ON a.a_id = b.b_id LIMIT 144
4 rows (110ms)查看CSV
id | created_at | updated_at | a_id | id | created_at | updated_at | b_id |
---|---|---|---|---|---|---|---|
1 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 1 | 1 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 1 |
2 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 2 | 2 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 2 |
1 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 1 | 3 | 2018-11-05 20:44:00.0 | 2018-11-05 20:44:00.0 | 1 |
1 | 2018-10-09 10:10:10.0 | 2018-10-09 10:10:10.0 | 1 | 4 | 2018-11-05 20:44:04.0 | 2018-11-05 20:44:04.0 | 1 |
看到没? 产生了四条记录,因为这个时候b 有重复的b_id,如果这时候计算a表值的count 或者sum ,结果会比预想的大。
虽然很简单,但是很容易犯错误