用户可以通过 spark.udf 功能添加自定义函数,实现自定义功能。
{"username": "daidai","age": 40}
{"username": "huanhuan","age":18}
需求:在username的数据里面,每一条数据的前面都加上 NAME:
//当前情况
+---+--------+
|age|username|
+---+--------+
| 40| daidai|
| 18|huanhuan|
+---+--------+
//期望情况
+---+--------------+
|age| username |
+---+--------------+
| 40| NAME:daidai |
| 18| NAME:huanhuan|
+---+--------------+
使用 spark.udf.register函数,上代码:
package com.sql
import org.apache.log4j.{Level, Logger}
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
object Spark_UDF {
def main(args: Array[String]): Unit = {
Logger.getLogger("org").setLevel(Level.ERROR)
//创建Spar环境
val conf = new SparkConf().setMaster("local[*]").setAppName(this.getClass.getName)
val spark = SparkSession.builder().config(conf).getOrCreate()
//获得文件并转换成DataFrame模式,创建临时表 取名为user
val df = spark.read.json("data/user.json")
df.createTempView("user")
//Spark UDF 函数,注册名称为newName 并传递一个参数为name,且name为String类型
spark.udf.register("newName", (name:String) =>{
//将每个name的前面都添加 " NAME: "
"NAME:" + name
})
//查询
spark.sql("select age,newName(username) as username from user").show()
//结束Spark程序
spark.stop()
}
}