Spark-SQL 之 join 类型

本文详细介绍了数据库中的各种Join类型,包括内连接、左连接和全连接,并重点解析了ShuffleHashJoin的启用条件。ShuffleHashJoin要求等值Join,禁止SortMergeJoin,且小表大小需满足特定比例限制。此外,文章还通过实例展示了不同Join类型的使用。
摘要由CSDN通过智能技术生成
Shuffle Hash Join

启用 Shuffle Hash Join 必须满足以下几个条件:

  • 仅支持等值 Join,不要求参与 Join 的 Keys 可排序;
  • spark.sql.join.preferSortMergeJoin 参数必须设置为 false,参数是从 Spark 2.0.0 版本引入的,默认值为 true,也就是默认情况下选择 Sort Merge Join;
  • 小表的大小(plan.stats.sizeInBytes)必须小于 spark.sql.autoBroadcastJoinThreshold * spark.sql.shuffle.partitions;而且小表大小(stats.sizeInBytes)的三倍必须小于等于大表的大小(stats.sizeInBytes),也就是 a.stats.sizeInBytes * 3 < = b.stats.sizeInBytes

常见 Join 类型

亦可参考: https://mazhuang.org/2017/09/11/joins-in-sql/

inner join

在这里插入图片描述

select * from studentone_message a
INNER JOIN studenttwo_message b
on a.id=b.id

隐形内连接:where连接
有显性内连接,当然也会有隐形内连接。一般的where 连接多个表也属于内连接,在数据库中被称为隐性内链接。如下:

select * from studentone_message a,studenttwo_message b
WHERE a.id=b.id
left join

LEFT JOIN 一般被译作左连接,也写作 LEFT OUTER JOIN。左连接查询会返回左表(表 A)中所有记录,不管右表(表 B)中有没有关联的数据。在右表中找到的关联数据列也会被一起返回。

文氏图:

SELECT A.PK AS A_PK, B.PK AS B_PK,
       A.Value AS A_Value, B.Value AS B_Value
FROM Table_A A
LEFT JOIN Table_B B
ON A.PK = B.PK;
FULL JOIN

在这里插入图片描述

select * from message1 a
FULL JOIN message2 b
on a.id=b.id
Join 的几种类型

参考链接:
https://qileq.com/tech/spark/sql/join/

Spark 的 HashJoin.scala 代码知,目前 Spark 支持如下几种类型的 join:
  • Inner join

内连接,语法:relation [ INNER ] JOIN relation [ join_criteria ]。

  • Cross join

笛卡尔连接,语法:relation CROSS JOIN relation [ join_criteria ]。

  • Left (outer) join

左(外)连接,语法:relation LEFT [ OUTER ] JOIN relation [ join_criteria ]。

  • Right (outer) join

右(外)连接,语法:relation RIGHT [ OUTER ] JOIN relation [ join_criteria ]。

  • Full (outer) join

全(外)连接,语法:relation FULL [ OUTER ] JOIN relation [ join_criteria ]。

  • (Left) Semi join

(左)半连接,语法:relation [ LEFT ] SEMI JOIN relation [ join_criteria ]。

  • (Left) Anti join

(左)反连接,语法:relation [ LEFT ] ANTI JOIN relation [ join_criteria ]。

Join 的示例
-- 源表 department
> SELECT * FROM department;
deptno  deptname
1   Marketing
2   Sales
3   Engineering

-- 源表 employee
> SELECT * FROM employee;
id  name    deptno
1   Amy 1
2   Lisa    2
3   John    2
4   Paul    4

-- Inner join
> SELECT id, name, deptname FROM employee INNER JOIN department ON employee.deptno = department.deptno;
id  name    deptname
1   Amy Marketing
2   Lisa    Sales
3   John    Sales

-- Left outer join
> SELECT id, name, deptname FROM employee LEFT JOIN department ON employee.deptno = department.deptno;
id  name    deptname
1   Amy Marketing
2   Lisa    Sales
3   John    Sales
4   Paul    NULL

-- Right outer join
> SELECT id, name, deptname FROM employee RIGHT JOIN department ON employee.deptno = department.deptno;
id  name    deptname
1   Amy Marketing
3   John    Sales
2   Lisa    Sales
NULL    NULL    Engineering

-- Full outer join
> SELECT id, name, deptname FROM employee FULL JOIN department ON employee.deptno = department.deptno;
id  name    deptname
1   Amy Marketing
2   Lisa    Sales
3   John    Sales
NULL    NULL    Engineering
4   Paul    NULL

-- Cross join
> SELECT id, name, deptname FROM employee CROSS JOIN department ON employee.deptno = department.deptno;
-- or implicit cross join: 
-- SELECT employee.id AS id, name, deptname FROM employee, department;
id  name    deptname
1   Amy Marketing
1   Amy Sales
1   Amy Engineering
2   Lisa    Marketing
2   Lisa    Sales
2   Lisa    Engineering
3   John    Marketing
3   John    Sales
3   John    Engineering
4   Paul    Marketing
4   Paul    Sales
4   Paul    Engineering

-- Cross join + on
> SELECT id, name, deptname FROM employee CROSS JOIN department ON employee.deptno = department.deptno;
id      name    deptname
1   Amy Marketing
2   Lisa    Sales
3   John    Sales

-- Semi join
> SELECT id, name, deptno FROM employee SEMI JOIN department ON employee.deptno = department.deptno;
id  name    deptno
1   Amy 1
2   Lisa    2
3   John    2

-- Anti join
> SELECT id, name, deptno FROM employee ANTI JOIN department ON employee.deptno = department.deptno;
id  name    deptno
4   Paul    4
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值