spark-12.sparkSQL_3_sparkSQL自定义函数

最新推荐文章于 2023-12-21 08:28:53 发布

蒙面小生

最新推荐文章于 2023-12-21 08:28:53 发布

阅读量260

点赞数

分类专栏： Spark 文章标签： spark sparkSQL

本文链接：https://blog.csdn.net/qq_30657195/article/details/106972600

版权

UDF函数

通过spark.udf.register(“name”,func)来进行注册。使用select func() … 来直接调用。如：

val peopleDF = spark.read.json("examples/src/main/resources/people.json")
peopleDF.createOrReplaceTempView("people")
spark.udf.register("add",(x:String)=>"A:"+x)
spark.sql("select add(name) from people").show

UDAF函数

1、弱类型UDAF函数

需要继承 UserDefinedAggregateFunction类，并复写方法。
注册一个UDAF函数。
使用自定以的UDAF函数。

如：

package com.dengdan.sparksql

import org.apache.spark.SparkConf
import org.apache.spark.sql.{
   Row, SparkSession}
import org.apache.spark.sql.expressions.{
   MutableAggregationBuffer, UserDefinedAggregateFunction}
import org.apache.spark.sql.types.{
   DataType, DoubleType, IntegerType, LongType, StructField, StructType}

/**
 * 自定义UDAF函数
 * 样例数据：
 * {"name":"Michael", "salary":3000}
 * {"name":"Andy", "salary":4500}
 * {"name":"Justin", "salary":3500}
 * {"name":"Berta", "salary":4000}
 * 目标：求平均工资【工资的总额，工资的个数】
 */
class AverageSal extends UserDefinedAggregateFunction {
   
  //输入数据
  override def inputSchema: StructType = StructType(StructField("salary", LongType) :: Nil)

  //每个分区中的 共享变量
  override def bufferSchema: StructType = StructType(StructField("sum", LongType) ::

最低0.47元/天解锁文章

蒙面小生

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
spark-12.sparkSQL_3_sparkSQL自定义函数

UDF函数通过spark.udf.register(“name”,func)来进行注册。使用select func() … 来直接调用。如：val peopleDF = spark.read.json("examples/src/main/resources/people.json")peopleDF.createOrReplaceTempView("people")spark.udf.register("add",(x:String)=>"A:"+x)spark.sql("select a
复制链接

扫一扫

专栏目录