SparkSql 自定义 UDF 函数
SparkSql自带很多函数,与Hive类似,当自带的函数无法满足使用条件时,就需要自定义函数来满足我们的需求。
特点:一进一出
- 通过匿名函数注册 udf 函数
- 通过实名函数注册 udf 函数
代码如下
package org.example
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.{SparkConf, SparkContext}
object UDF {
def main(args: Array[String]): Unit = {
//屏蔽日志
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
//创建SparkSession
var conf = new SparkConf().setAppName("cai").setMaster("local[*]")
var sc = new SparkContext(conf)
val spark:SparkSession = SparkSession.builder().config(conf).getOrCreate()
//读取json文件
var rddDF = spark.read.option("head",true).json("test/users.json")
/**
* 使用匿名函数来注册udf函数
*/
//简单的函数主体,此案例比较简单
//将字母大写
spark.udf.register("upper_test",(x:String)=>x.toUpperCase())
//取到字符串长度
spark.udf.register("length_test",(x:String)=>x.length)
/**
* 使用实名函数来注册udf函数
* 实名函数的注册有点不同,要在后面加 _(注意前面有个空格)
*/
def age_panduan(age:Int):Boolean = {
if (age>20){true} else {false}
}
spark.udf.register("age_panduan",age_panduan _)
//创建临时表,全局表使用 createGlobalTempView 来创建
rddDF.createTempView("users")
//sql语法
//SparkSql自带了很多的函数,本案例自定义的函数是upper_test、age_panduan和length_test
//其他的为几个常用的自带函数示例(abs、length、concat、cast)
spark.sql(
"select " +
" upper_test(name) as upper_name," +
"length_test(name) as name_length," +
"age," +
"phone," +
"age_panduan(age) as isage," +
"abs(age+520) as abs," +
"length(concat(cast(age as String),'memeda')) as test_hanshu" +
" from users"
).show()
}
}
测试数据
{"name":"xiyangyang","age":20,"phone":15552211521}
{"name":"huitailang", "age":19,"phone":13287994007}
{"name":"小泽雅美", "age":21,"phone":15552211523}
{"name":"堺雅人","age":20,"phone":15552211521}
{"name":"罗伯特唐尼", "age":19,"phone":13287994007}
{"name":"金刚狼", "age":21,"phone":15552211523}
结果
+----------+-----------+---+-----------+-----+---+-----------+
|upper_name|name_length|age| phone|isage|abs|test_hanshu|
+----------+-----------+---+-----------+-----+---+-----------+
|XIYANGYANG| 10| 20|15552211521|false|540| 8|
|HUITAILANG| 10| 19|13287994007|false|539| 8|
| 小泽雅美| 4| 21|15552211523| true|541| 8|
| 堺雅人| 3| 20|15552211521|false|540| 8|
|罗伯特唐尼| 5| 19|13287994007|false|539| 8|
| 金刚狼| 3| 21|15552211523| true|541| 8|
+----------+-----------+---+-----------+-----+---+-----------+