合并 join/union
1.1 join
# 1. 拼接两个df
df3 = df1.union(df2)
df.unionALL(df.limit(1))
# 2. 根据条件拼接
# 单字段
df = df_left.join(df_right, df_left.key == df_right.key, "inner")
# 多字段
df1.join(df2, Seq("id", "name"))
# 混合字段
df1.join(df2, df1("id" ) === df2( "t1_id"))
join 操作与pandas中的merge操作相似,需要注意拼接时坐标为主还是右表为主,是内连接还是外连接。
1.2 查看两个df的并集和交集
# 构建df
sentenceDataFrame = spark.createDataFrame((
(1, "asf"),
(2, "21