1、DataFrame中过滤NULL值、筛选指定值
val cityCodes = List(3134,3124,3154,3242)
val result = df
.filter(col("cate_code").isNotNull)
.filter(col("brand_code").isNotNull)
.filter(col("age").isInCollection(Iterable(0,1,2,3,4)))
.filter(col("level").isin(0,1,2,3,4))
.filter(col("city").isin(cityCodes:_*))
.withColumn("version",expr("IF(cate_code is NULL,'unknown',getVersion(cate_code))"))
.withColumn("type",lit(56))
.withColumn("dt",lit(date))
2、case语法与match、map的结合使用
case语法与match 、map结合使用完成业务中常见的分情况匹配或分情况讨论的场景问题。
2.1、match ...case ...
(1)和正则表达式结合使用
val pattern1 = "spark.*".r
val pattern2 = "toby.*".r
def get(key:String):Any = key match {
case pattern1(_) => pattern1 findAllIn key toList
case pattern2(_) => pattern2 findFirstIn key toList
case "default" => "default"
case _ => ""
}
println(get("spark-definitive-guide"))
println(get("toby-gao-my-name"))
println(get("this-is-test"))
(2) 用于匹配,分情况
val v2 = 1
val result = v2 match {
case 1 => "this is 1"
case 2 => "this is 2"
case _ => "default"
}
2.2、map ...case ...
把match ... case 和map方法结合起来使用,用在对数据做转换处理的场景中
//示例1:
df.map{x => val id = x._1
id match {
case 1 => (10, x._2)
case 2 => (20, x._2)
case 3 => (30, x._2)
}
}
//示例2:
df.map{ x => val id = x._1
id match {
case 1 => id =10
case 2 => id =20
case 3 => id =30
}
(id , x._2)
}
//示例3:
df.map{case (id,num) => (id, num, num+5)}
df.map(x=>(x._1, x._2, x._2 +5))
2.3 filter ... case...
//示例4:
df.filter{case (id,_) => id ==1}