Table of Contents
2.2 Full, Left 和 Right outer join的实例
2.3 Left Semi和Left Anti join的实例
3. Cross join的实例和Natural join的讨论
各种Join是在数据库中实现的关系代数(Relational Algebra)的重要操作,而Spark中的数据处理也会用到这些操作。这里总结了目前Spark版本中实现的各种Join,并给出实例以便于理解。
function Types | |||||
Type 1: using joinExprs -- support both equi- and non-equi-join |
Type 2: Equi-join, using Column Strings -- no duplicated columns with the same names |
Type 3: join + predicate, -- equal to Type 1, but only inner join |
Type 4: cross join | ||
Join Name | joinType in Spark | def join(right: Dataset[_], joinExprs: Column, joinType: String): DataFrame | def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): DataFrame | ||
inner | inner | def join(right: Dataset[_], joinExprs: Column): DataFrame | def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame def join(right: Dataset[_], usingColumn: String): DataFrame |
def join(right: Dataset[_]): DataFrame | NA |
full outer | outer, full, full_outer | NA | NA | ||
left outer | left, left_outer | NA | NA | ||
right outer | right, right_outer |