import org.apache.spark.sql.functions._
val longLength = udf((bookTitle: String, length: Int) => bookTitle.length > length)
import sqlContext.implicits._
val booksWithLongTitle = dataFrame.filter(longLength($"title", $"10"))
注意,代码片段中的 sqlContext
是之前已经实例化的SQLContext对象。
不幸,运行这段代码会抛出异常:
cannot resolve '10' given input columns id, title, author, price, publishedDate;
因为采用 $
来包裹一个常量,会让Spark错以为这是一个Column。这时,需要定义在org.apache.spark.sql.functions中的 lit
函数来帮助:
val booksWithLongTitle = dataFrame.filter(longLength($"title", lit(10)))