文章目录
一需求分析
读输入的数据进行聚合,对给出的集合元素进行累加和求平均值,返回计算后的结果
二 自定义UDAF实现
import java.text.DecimalFormat
import org.apache.spark.sql.Row
import org.apache.spark.sql.expressions.{MutableAggregationBuffer, UserDefinedAggregateFunction}
import org.apache.spark.sql.types._
/**
需求分析,根据传入的数据,计算出总和与平均值 返回字符串
*/
class MySumUDAF extends UserDefinedAggregateFunction {
// 输入数据的类型 定义为字符串
override def inputSchema: StructType = StructType(StructField("num",LongType)::Nil)
// 定义缓冲区中的数据结构 第一个存 数据和 第二个存个数
override def bufferSchema: StructType = StructType(StructField("count",LongType)::StructField("num",LongType)::Nil)
// 返回值的类型
override def dataType: DataType = StringType
// 是否唯一性 输入重复的数据 是否输出同样的结果
override def deterministic: Boolean = true
// 缓冲区初始化
override def initialize(buffer: MutableAggregationBuffer): Unit = {
buffer(0)=0L;
buffer(1)=0L;
}
// 分区内合并
override def update(buffer: MutableAggregationBuffer, input: Row): Unit = {
buffer(0)=buffer.getLong(0)+input.getLong(0)
buffer(1)=buffer.getLong(1)+1
}
//分区间合并
override def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = {
buffer1(0)=buffer1.getLong(0)+buffer2.getLong(0)
buffer1(1)=buffer1.getLong(1)+buffer2.getLong(1)
}
// 结果返回
override def evaluate(buffer: Row): String = {
val res: Double = buffer.getLong(0).toDouble /buffer.getLong(1).toDouble
val format = new DecimalFormat("0.00")
s"${buffer.getLong(0)}->${format.format(res)}"
}
}
三测试
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession
object MySumUDAFDemo {
def main(args: Array[String]): Unit = {
val spark: SparkSession = SparkSession.builder()
.master("local[2]")
.appName("MySumUDAFDemo")
.getOrCreate()
// 注册自定义的UDAF函数
spark.udf.register("mysum",new MySumUDAF)
import spark.implicits._
val list: List[Int] = List(1,2,3,4,5,6,7,8,9,10)
val rddList: RDD[Int] = spark.sparkContext.makeRDD(list)
// 将RDD集合转为对应的DF
rddList.toDF("num").createOrReplaceTempView("tmp")
spark.sql("select mysum(num) from tmp").show()
spark.close()
}
}
运行结果:
+------------------------------+
|mysumudaf(CAST(num AS BIGINT))|
+------------------------------+
| 55->5.50|
+------------------------------+