Scala中一些经典场景的解决方案
1、Spark DataFrmae执行udf函数时传入外部变量(20190925)
解决方案:通过scala中的闭包实现
示例如下:将value在valueList中的行中的date进行格式化
package antistop
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.hive.HiveContext
object Test {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setAppName(getClass.getSimpleName).setMaster("local[2]")
val sc = new SparkContext(conf)
val spark = new HiveContext(sc)
import spark.implicits._
import org.apache.spark.sql.functions._
var df = Seq(
(1, "A", "2019-01-01"),
(2, "B", "2019-01-02"),
(3, "C", "2019-01-03"),
(3, "D", "2019-01-04")
).toDF("id", "value", "date")
val valueList: List[String] = List("A", "B", "C")
// 闭包实现传入valueList
val factor = (value: String, data: String) => formatData(value, data, valueList)
val udf_dpi = spark.udf.register("udf_dpi", factor)
df = df.withColumn("date", udf_dpi(col("value"), col("date")))
df.show()
}
def formatData(value: String, date: String, valueList: List[String]): String = {
if (valueList.contains(value)) {
return date.replace("-", "")
}
date
}
}
2. 创建POJO 实例(20191012)
问题描述:在需要POJO时,有以下几种方式实现:
解决方案:
- 创建class(Object)文件对类进行定义(和java无甚区别,不做说明)
- 在本类添加case Class定义
- 通过匿名实例化
// 在本类添加case Class定义
case class User(name: String, age: Int)
val data1 = User("ncuwen", 1)
val A: {
val name: String
val age: Int
} = new {
val name: String = "ncuwen"
val age: Int = 0
}
println(A.name, ":", A.age)
// 简略模式
val B = new {
val name: String = "ncuwen"
val age: Int = 0
}
持续更新中…