参考:1. https://spark.apache.org/docs/latest/sql-getting-started.html#untyped-user-defined-aggregate-functions
2.https://spark.apache.org/docs/latest/sql-getting-started.html#type-safe-user-defined-aggregate-functions
3.https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-udfs.html
- 通过spark sql使用–适合于当前行的简单操作
val dataset = Seq((0, "hello"), (1, "world")).toDF("id", "text")
dataset.createOrReplaceTempView("test")
spark.udf.register("myUpper", (input: String) => input.toUpperCase)
spark.sql("select id,text,myUpper(text) from test")
结果如下:
id | text | upper |
---|---|---|
0 | hello | HELLO |
1 | world | WORLD |
- 通过withColumn使用–适合于当前行的简单操作
val upper: String => String = _.toUpperCase
import org.apache.spark.sql.functions.udf
val upperUDF = udf(upper)
dataset.withColumn("upper", upperUDF('text)).show
结果如下
id | text | upper |
---|---|---|
0 | hello | HELLO |
1 | world | WORLD |
- 继承UserDefinedAggregateFunction–DataFrame的聚合函数,适合于有聚合操作时使用
- 继承Aggregator–类型安全