背景
Row(Catalyst Row)表示关系运算符的一行输出。它是一个通用行对象,具有有序的字段集合,可以通过索引(generic access by ordinal),字段名(primitive access)或使用Scala的模式匹配来访问。
要创建新Row,请在Java中使用RowFactory.create()或在Scala中使用Row.apply()。
构造Row
Row的伴生对象提供工厂方法,可以从元素集合(apply),元素序列(fromSeq)和元组(fromTuple)创建Row实例。
import org.apache.spark.sql.Row
// Create a Row from values.
scala> Row(1, "hello")
res0: org.apache.spark.sql.Row = [1,hello]
// by apply
scala> Row.apply(1, "hello")
res0: org.apache.spark.sql.Row = [1,hello]
// Created a Row from a Seq of values.
scala> Row.fromSeq(Seq(1, "hello"))
res1: org.apache.spark.sql.Row = [1,hello]
// Created a Row from a Tuple of values.
scala> Row.fromTuple((0, "hello"))
res2: org.apache.spark.sql.Row = [0,hello]
解析Row值
一般来说,我们可以通过索引(generic access by ordinal)的通用访问来访问Row值
import org.apache.spark.sql.Row
scala> val row = Row(1, true, "a string", null)
row: org.apache.spark.sql.Row = [1,true,a string,null]
// by index
scala> val firstValue = row(0)
firstValue: Any = 1
scala> val fourthValue = row(3)
fourthValue: Any = null
// by get
scala> val firstValue = row.get(0)
firstValue: Any = 1
scala> val fourthValue = row.get(3)
fourthValue: Any = null
// by apply
scala> val firstValue = row.apply(0)
firstValue: Any = 1
scala> val fourthValue = row.apply(3)
fourthValue: Any = null
按顺序进行的通用访问(使用索引、apply或get)返回Any类型的值。可以使用带索引的getAs查询具有适当类型的字段。
val row = Row(1, "hello")
scala> row.getAs[Int](0)
res1: Int = 1
scala> row.getAs[String](1)
res2: String = hello
Row与模式匹配
而在Scala中,还可以在模式匹配中提取Row对象中的字段。 例子如下:
scala> val res4= Row(1, "hello") match {
case Row(key: Int, value: String) =>
key -> value
}
res4: (Int, String) = (1,hello)