pyspark rdd连接函数之join、leftOuterJoin、rightOuterJoin和fullOuterJoin、union函数介绍

1 篇文章 0 订阅

各种JOIN在Spark Core中的使用

一. inner join

inner join,只返回左右都匹配上的

>>> data2 = sc.parallelize(range(6,15)).map(lambda line:(line,1))
>>> data2.collect()
[(6, 1), (7, 1), (8, 1), (9, 1), (10, 1), (11, 1), (12, 1), (13, 1), (14, 1)]   
>>> data1 = sc.parallelize(range(10)).map(lambda line:(line,1))
>>> data1.collect()
[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1), (8, 1), (9, 1)]
>>> data1.join(data2)
PythonRDD[14] at RDD at PythonRDD.scala:43
>>> data1.join(data2).collect()
[(6, (1, 1)), (7, (1, 1)), (8, (1, 1)), (9, (1, 1))]  

二. left outer join

left:是以左边为基准,向左靠

左边(a)的记录一定会存在,右边(b)的记录有的返回Some(x),没有的补None。

>>> data1.leftOuterJoin(data2).collect()
[(0, (1, None)), (1, (1, None)), (2, (1, None)), (3, (1, None)), (4, (1, None)), (5, (1, None)), (6, (1, 1)), (7, (1, 1)), (8, (1, 1)), (9, (1, 1))]
>>> data2.leftOuterJoin(data1).collect()
[(6, (1, 1)), (7, (1, 1)), (8, (1, 1)), (9, (1, 1)), (10, (1, None)), (11, (1, None)), (12, (1, None)), (13, (1, None)), (14, (1, None))]

三. right outer join

right:是以右边为基准,向右靠

>>> data1.rightOuterJoin(data2).collect()
[(6, (1, 1)), (7, (1, 1)), (8, (1, 1)), (9, (1, 1)), (10, (None, 1)), (11, (None, 1)), (12, (None, 1)), (13, (None, 1)), (14, (None, 1))]
>>> data2.rightOuterJoin(data1).collect()
[(0, (None, 1)), (1, (None, 1)), (2, (None, 1)), (3, (None, 1)), (4, (None, 1)), (5, (None, 1)), (6, (1, 1)), (7, (1, 1)), (8, (1, 1)), (9, (1, 1))]

右边(b)的记录一定会存在,左边(a)的记录有的返回None,没有的补None。

四. full outer join

注意:使用JOIN之前,要知道JOIN之后的数据结构是什么。

>>> data1.fullOuterJoin(data2).collect()
[(0, (1, None)), (1, (1, None)), (2, (1, None)), (3, (1, None)), (4, (1, None)), (5, (1, None)), (6, (1, 1)), (7, (1, 1)), (8, (1, 1)), (9, (1, 1)), (10, (None, 1)), (11, (None, 1)), (12, (None, 1)), (13, (None, 1)), (14, (None, 1))]

五、union

>>> data1.union(data2).collect()
[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1), (8, 1), (9, 1), (6, 1), (7, 1), (8, 1), (9, 1), (10, 1), (11, 1), (12, 1), (13, 1), (14, 1)]

参考:https://blog.csdn.net/wawa8899/article/details/81027633 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值