spark sql的join方法

24 篇文章 0 订阅

join(other, on=None, how=None)[source]

Joins with another DataFrame, using the given join expression.

Parameters

        other – Right side of the join

        on – a string for the join column name, a list of column names, a join expression (Column), or a list of Columns. If on is a string or a list of strings indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join.
        on - 用于连接列名称的字符串,列名列表,连接表达式(列)或列列表。 如果on是字符串或指示连接列名称的字符串列表,则列必须存在于两侧,并执行等连接。

        how – str, default inner. Must be one of: inner, cross, outer, full, full_outer, left, left_outer, right, right_outer, left_semi, and left_anti.

The following performs a full outer join between df1 and df2.

>>> df.join(df2, df.name == df2.name, 'outer').select(df.name, df2.height).collect()
[Row(name=None, height=80), Row(name='Bob', height=85), Row(name='Alice', height=None)]

>>> df.join(df2, 'name', 'outer').select('name', 'height').collect()
[Row(name='Tom', height=80), Row(name='Bob', height=85), Row(name='Alice', height=None)]

>>> cond = [df.name == df3.name, df.age == df3.age]
>>> df.join(df3, cond, 'outer').select(df.name, df3.age).collect()
[Row(name='Alice', age=2), Row(name='Bob', age=5)]

>>> df.join(df2, 'name').select(df.name, df2.height).collect()
[Row(name='Bob', height=85)]

>>> df.join(df4, ['name', 'age']).select(df.name, df.age).collect()
[Row(name='Bob', age=5)]

例子:

import org.apache.spark.sql.types._
import org.apache.spark.sql.{Row, SparkSession}
val dat1 = Array(("a", "2", "3", "4", "5"), ("b", "7", "8", "9", "10"))
val data1 = spark.createDataFrame(dat1).toDF("name", "col2", "col3", "col4", "col5")
df1.show()

val dat2 = Array(("b", "20", "30", "40", "50"), ("c", "70", "80", "90", "100"))
val data2 = spark.createDataFrame(dat2).toDF("name", "col2", "col3", "col4", "col5")

data1:
data1
data2:
data2

data1.columns(0)
data1(data1.columns(0))

val dj=data2.join(data1,data1(data1.columns(0))===data2(data2.columns(0)),"full")
dj.show()

data1.columns(0)显示: 显示为一个String类型的列名
string类型
data1(data1.columns(0)) 返回一个名字为name的Colum类型 参考spark文档:点击跳转
Cloumndj 一个DateFrame类型
DateFrame类型dj.show()
show

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值