1 使用自定义函数UDAF实现求平均值(avg)的需求
数据如下:
name,salary,dept laoduan,500000,teacher xiaolin,20000,student laozhao,40000,teacher xiaolei,19000,student xiaona,21000,waiter
方式一: 使用老的(过时的)API ,需要先创建一个类,继承自UserDefinedAggregateFunction,并实现其8个方法
import org.apache.spark.sql.{Row, types} import org.apache.spark.sql.expressions.{MutableAggregationBuffer, UserDefinedAggregateFunction} import org.apache.spark.sql.types.{DataType, DoubleType, IntegerType, StructField, StructType} class MyAvgFunction2 extends UserDefinedAggregateFunction{ //输入数据的类型 override def inputSchema: StructType = StructType(List( StructField("in",DoubleType))) //中间要缓存数据的数据类型 override def bufferSchema: StructType = StructType(List( StructField("totalmoney",DoubleType), StructField("numpeople",IntegerType) )) //返回的数据类型 ove