php的left join,Left Join

喝前尧一尧

于 2021-04-01 04:44:29 发布

阅读量1.2k

点赞数

文章标签： php的left join

开发有个语句执行了超过2个小时没有结果，询问我到底为什么执行这么久。语句格式如下select * from tgt1 a left join tgt2 b on a.id=b.id and a.id=6 order by a.id; 这个是典型的理解错误，本意是要对a表进行过滤后进行 []left join] 的，我们来看看到底

开发有个语句执行了超过2个小时没有结果，询问我到底为什么执行这么久。

语句格式如下select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id; 这个是典型的理解错误，本意是要对a表进行过滤后进行[]left join]的，我们来看看到底什么是真正的[left join]。

[gpadmin@mdw ~]$ psql bigdatagp

psql (8.2.15)

Type "help" for help.

bigdatagp=# drop table tgt1;

DROP TABLE

bigdatagp=# drop table tgt2;

DROP TABLE

bigdatagp=# explain select t1.telnumber,t2.ua,t2.url,t1.apply_name,t2.apply_name from gpbase.tb_csv_gn_ip_session t1 ,gpbase.tb_csv_gn_http_session_hw t2 where t1.bigdatagp=# \q bigdatagp=# create table tgt1(id int, name varchar(20)); NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.

HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.

CREATE TABLE

bigdatagp=# create table tgt2(id int, name varchar(20));

NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.

HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.

CREATE TABLE

bigdatagp=# insert into tgt1 select generate_series(1,3),('a','b');

ERROR: column "name" is of type character varying but expression is of type record

HINT: You will need to rewrite or cast the expression.

bigdatagp=# insert into tgt1 select generate_series(1,5),generate_series(1,5)||'a';

INSERT 0 5

bigdatagp=# insert into tgt2 select generate_series(1,2),generate_series(1,2)||'a';

INSERT 0 2

bigdatagp=# select * from tgt1;

id | name

----+------

2 | 2a

4 | 4a

1 | 1a

3 | 3a

5 | 5a

(5 rows)

bigdatagp=# select * from tgt1 order by id;

id | name

----+------

1 | 1a

2 | 2a

3 | 3a

4 | 4a

5 | 5a

(5 rows)

bigdatagp=# select * from tgt2 order by id;

id | name

----+------

1 | 1a

2 | 2a

(2 rows)

bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id;

id | name | id | name

----+------+----+------

3 | 3a | |

5 | 5a | |

1 | 1a | 1 | 1a

2 | 2a | 2 | 2a

4 | 4a | |

(5 rows)

bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id order by a.id;

id | name | id | name

----+------+----+------

1 | 1a | 1 | 1a

2 | 2a | 2 | 2a

3 | 3a | |

4 | 4a | |

5 | 5a | |

(5 rows)

bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where id>=3 order by a.id;

ERROR: column reference "id" is ambiguous

LINE 1: ...* from tgt1 a left join tgt2 b on a.id=b.id where id>=3 orde...

bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=3 order by a.id;

id | name | id | name

----+------+----+------

3 | 3a | |

4 | 4a | |

5 | 5a | |

(3 rows)

bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=3 order by a.id;

id | name | id | name

----+------+----+------

1 | 1a | |

2 | 2a | |

3 | 3a | |

4 | 4a | |

5 | 5a | |

(5 rows)

bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=6 order by a.id;

id | name | id | name

----+------+----+------

(0 rows)

bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id;

id | name | id | name

----+------+----+------

1 | 1a | |

2 | 2a | |

3 | 3a | |

4 | 4a | |

5 | 5a | |

(5 rows)

bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=3 order by a.id;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------

Gather Motion 64:1 (slice1; segments: 64) (cost=7.18..7.19 rows=1 width=14)

Merge Key: "?column5?"

Rows out: 3 rows at destination with 21 ms to end, start offset by 559 ms.

-> Sort (cost=7.18..7.19 rows=1 width=14)

Sort Key: a.id

Rows out: Avg 1.0 rows x 3 workers. Max 1 rows (seg52) with 5.452 ms to first row, 5.454 ms to end, start offset by 564 ms.

Executor memory: 63K bytes avg, 74K bytes max (seg2).

Work_mem used: 63K bytes avg, 74K bytes max (seg2). Workfile: (0 spilling, 0 reused)

-> Hash Left Join (cost=2.04..7.15 rows=1 width=14)

Hash Cond: a.id = b.id

Rows out: Avg 1.0 rows x 3 workers. Max 1 rows (seg52) with 4.190 ms to first row, 4.598 ms to end, start offset by 565 ms.

-> Seq Scan on tgt1 a (cost=0.00..5.06 rows=1 width=7)

Filter: id >= 3

Rows out: Avg 1.0 rows x 3 workers. Max 1 rows (seg52) with 0.156 ms to first row, 0.158 ms to end, start offset by 565 ms.

-> Hash (cost=2.02..2.02 rows=1 width=7)

Rows in: (No row requested) 0 rows (seg0) with 0 ms to end.

-> Seq Scan on tgt2 b (cost=0.00..2.02 rows=1 width=7)

Rows out: (No row requested) 0 rows (seg0) with 0 ms to end.

Slice statistics:

(slice0) Executor memory: 332K bytes.

(slice1) Executor memory: 446K bytes avg x 64 workers, 4329K bytes max (seg52). Work_mem: 74K bytes max.

Statement statistics:

Memory used: 128000K bytes

Total runtime: 580.630 ms

(24 rows)

bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=3 order by a.id;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------

Gather Motion 64:1 (slice1; segments: 64) (cost=7.23..7.24 rows=1 width=14)

Merge Key: "?column5?"

Rows out: 5 rows at destination with 24 ms to end, start offset by 701 ms.

-> Sort (cost=7.23..7.24 rows=1 width=14)

Sort Key: a.id

Rows out: Avg 1.0 rows x 5 workers. Max 1 rows (seg42) with 6.292 ms to first row, 6.294 ms to end, start offset by 715 ms.

Executor memory: 70K bytes avg, 74K bytes max (seg0).

Work_mem used: 70K bytes avg, 74K bytes max (seg0). Workfile: (0 spilling, 0 reused)

-> Hash Left Join (cost=2.04..7.17 rows=1 width=14)

Hash Cond: a.id = b.id

Join Filter: a.id >= 3

Rows out: Avg 1.0 rows x 5 workers. Max 1 rows (seg42) with 4.422 ms to first row, 5.055 ms to end, start offset by 717 ms.

Executor memory: 1K bytes avg, 1K bytes max (seg42).

Work_mem used: 1K bytes avg, 1K bytes max (seg42). Workfile: (0 spilling, 0 reused)

(seg42) Hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets.

-> Seq Scan on tgt1 a (cost=0.00..5.05 rows=1 width=7)

Rows out: Avg 1.0 rows x 5 workers. Max 1 rows (seg42) with 0.179 ms to first row, 0.180 ms to end, start offset by 717 ms.

-> Hash (cost=2.02..2.02 rows=1 width=7)

Rows in: Avg 1.0 rows x 2 workers. Max 1 rows (seg42) with 0.194 ms to end, start offset by 721 ms.

-> Seq Scan on tgt2 b (cost=0.00..2.02 rows=1 width=7)

Rows out: Avg 1.0 rows x 2 workers. Max 1 rows (seg42) with 0.143 ms to first row, 0.145 ms to end, start offset by 721 ms.

Slice statistics:

(slice0) Executor memory: 332K bytes.

(slice1) Executor memory: 581K bytes avg x 64 workers, 4353K bytes max (seg42). Work_mem: 74K bytes max.

Statement statistics:

Memory used: 128000K bytes

Total runtime: 725.316 ms

(27 rows)

bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=6 order by a.id;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------

Gather Motion 64:1 (slice1; segments: 64) (cost=7.17..7.18 rows=1 width=14)

Merge Key: "?column5?"

Rows out: (No row requested) 0 rows at destination with 6.536 ms to end, start offset by 1.097 ms.

-> Sort (cost=7.17..7.18 rows=1 width=14)

Sort Key: a.id

Rows out: (No row requested) 0 rows (seg0) with 0 ms to end.

Executor memory: 33K bytes avg, 33K bytes max (seg0).

Work_mem used: 33K bytes avg, 33K bytes max (seg0). Workfile: (0 spilling, 0 reused)

-> Hash Left Join (cost=2.04..7.15 rows=1 width=14)

Hash Cond: a.id = b.id

Rows out: (No row requested) 0 rows (seg0) with 0 ms to end.

-> Seq Scan on tgt1 a (cost=0.00..5.06 rows=1 width=7)

Filter: id >= 6

Rows out: (No row requested) 0 rows (seg0) with 0 ms to end.

-> Hash (cost=2.02..2.02 rows=1 width=7)

Rows in: (No row requested) 0 rows (seg0) with 0 ms to end.

-> Seq Scan on tgt2 b (cost=0.00..2.02 rows=1 width=7)

Rows out: (No row requested) 0 rows (seg0) with 0 ms to end.

Slice statistics:

(slice0) Executor memory: 332K bytes.

(slice1) Executor memory: 225K bytes avg x 64 workers, 225K bytes max (seg0). Work_mem: 33K bytes max.

Statement statistics:

Memory used: 128000K bytes

Total runtime: 8.615 ms

(24 rows)

bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------

Gather Motion 64:1 (slice1; segments: 64) (cost=7.23..7.24 rows=1 width=14)

Merge Key: "?column5?"

Rows out: 5 rows at destination with 115 ms to end, start offset by 1.195 ms.

-> Sort (cost=7.23..7.24 rows=1 width=14)

Sort Key: a.id

Rows out: Avg 1.0 rows x 5 workers. Max 1 rows (seg42) with 6.979 ms to first row, 6.980 ms to end, start offset by 12 ms.

Executor memory: 72K bytes avg, 74K bytes max (seg0).

Work_mem used: 72K bytes avg, 74K bytes max (seg0). Workfile: (0 spilling, 0 reused)

-> Hash Left Join (cost=2.04..7.17 rows=1 width=14)

Hash Cond: a.id = b.id

Join Filter: a.id >= 6

Rows out: Avg 1.0 rows x 5 workers. Max 1 rows (seg42) with 5.570 ms to first row, 6.157 ms to end, start offset by 12 ms.

Executor memory: 1K bytes avg, 1K bytes max (seg42).

Work_mem used: 1K bytes avg, 1K bytes max (seg42). Workfile: (0 spilling, 0 reused)

(seg42) Hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets.

-> Seq Scan on tgt1 a (cost=0.00..5.05 rows=1 width=7)

Rows out: Avg 1.0 rows x 5 workers. Max 1 rows (seg42) with 0.050 ms to first row, 0.051 ms to end, start offset by 12 ms.

-> Hash (cost=2.02..2.02 rows=1 width=7)

Rows in: Avg 1.0 rows x 2 workers. Max 1 rows (seg42) with 0.153 ms to end, start offset by 18 ms.

-> Seq Scan on tgt2 b (cost=0.00..2.02 rows=1 width=7)

Rows out: Avg 1.0 rows x 2 workers. Max 1 rows (seg42) with 0.133 ms to first row, 0.135 ms to end, start offset by 18 ms.

Slice statistics:

(slice0) Executor memory: 332K bytes.

(slice1) Executor memory: 583K bytes avg x 64 workers, 4353K bytes max (seg42). Work_mem: 74K bytes max.

Statement statistics:

Memory used: 128000K bytes

Total runtime: 116.997 ms

(27 rows)

bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where id=6 order by a.id;

ERROR: column reference "id" is ambiguous

LINE 1: ...* from tgt1 a left join tgt2 b on a.id=b.id where id=6 order...

bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id=6 order by a.id;

QUERY PLAN

-----------------------------------------------------------------------------------------------------

Gather Motion 1:1 (slice1; segments: 1) (cost=7.17..7.18 rows=4 width=14)

Merge Key: "?column5?"

Rows out: (No row requested) 0 rows at destination with 3.212 ms to end, start offset by 339 ms.

-> Sort (cost=7.17..7.18 rows=1 width=14)

Sort Key: a.id

Rows out: (No row requested) 0 rows with 0 ms to end.

Executor memory: 58K bytes.

Work_mem used: 58K bytes. Workfile: (0 spilling, 0 reused)

-> Hash Left Join (cost=2.04..7.14 rows=1 width=14)

Hash Cond: a.id = b.id

Rows out: (No row requested) 0 rows with 0 ms to end.

-> Seq Scan on tgt1 a (cost=0.00..5.06 rows=1 width=7)

Filter: id = 6

Rows out: (No row requested) 0 rows with 0 ms to end.

-> Hash (cost=2.02..2.02 rows=1 width=7)

Rows in: (No row requested) 0 rows with 0 ms to end.

-> Seq Scan on tgt2 b (cost=0.00..2.02 rows=1 width=7)

Filter: id = 6

Rows out: (No row requested) 0 rows with 0 ms to end.

Slice statistics:

(slice0) Executor memory: 252K bytes.

(slice1) Executor memory: 251K bytes (seg3). Work_mem: 58K bytes max.

Statement statistics:

Memory used: 128000K bytes

Total runtime: 342.067 ms

(25 rows)

bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id and a.id=6 order by a.id;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------

Gather Motion 64:1 (slice1; segments: 64) (cost=7.23..7.24 rows=1 width=14)

Merge Key: "?column5?"

Rows out: 5 rows at destination with 435 ms to end, start offset by 1.130 ms.

-> Sort (cost=7.23..7.24 rows=1 width=14)

Sort Key: a.id

Rows out: Avg 1.0 rows x 5 workers. Max 1 rows (seg42) with 5.156 ms to first row, 5.158 ms to end, start offset by 7.597 ms.

Executor memory: 58K bytes avg, 58K bytes max (seg0).

Work_mem used: 58K bytes avg, 58K bytes max (seg0). Workfile: (0 spilling, 0 reused)

-> Hash Left Join (cost=2.04..7.17 rows=1 width=14)

Hash Cond: a.id = b.id

Join Filter: a.id = 6

Rows out: Avg 1.0 rows x 5 workers. Max 1 rows (seg42) with 4.155 ms to first row, 4.813 ms to end, start offset by 7.930 ms.

Executor memory: 1K bytes avg, 1K bytes max (seg42).

Work_mem used: 1K bytes avg, 1K bytes max (seg42). Workfile: (0 spilling, 0 reused)

(seg42) Hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets.

-> Seq Scan on tgt1 a (cost=0.00..5.05 rows=1 width=7)

Rows out: Avg 1.0 rows x 5 workers. Max 1 rows (seg42) with 0.126 ms to first row, 0.127 ms to end, start offset by 7.941 ms.

-> Hash (cost=2.02..2.02 rows=1 width=7)

Rows in: Avg 1.0 rows x 2 workers. Max 1 rows (seg42) with 0.103 ms to end, start offset by 12 ms.

-> Seq Scan on tgt2 b (cost=0.00..2.02 rows=1 width=7)

Rows out: Avg 1.0 rows x 2 workers. Max 1 rows (seg42) with 0.074 ms to first row, 0.076 ms to end, start offset by 12 ms.

Slice statistics:

(slice0) Executor memory: 332K bytes.

(slice1) Executor memory: 569K bytes avg x 64 workers, 4337K bytes max (seg42). Work_mem: 58K bytes max.

Statement statistics:

Memory used: 128000K bytes

Total runtime: 436.384 ms

(27 rows)

因此如果要对a表过滤需要把条件写在where里面，要对b表过滤需要把调教写在b表的子查询里面，至于[ON]只是用来控制显示的。

-EOF-

本文原创发布php中文网，转载请注明出处，感谢您的尊重！

喝前尧一尧

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
php的left join,Left Join

开发有个语句执行了超过2个小时没有结果，询问我到底为什么执行这么久。语句格式如下select * from tgt1 a left join tgt2 b on a.id=b.id and a.id=6 order by a.id; 这个是典型的理解错误，本意是要对a表进行过滤后进行 []left join] 的，我们来看看到底开发有个语句执行了超过2个小时没有结果，询问我到底为什么执行这么久。...
复制链接

扫一扫