简化需求
一张hive表,有三个字段id,score,date,分别代表用户,信用得分,日期。格式如下:
1,50,201909
2,60,201909
1,30,201910
2,62,201910
每个月根据用户的消费情况得出其信用得分,但是有个前提是,本月的得分不能低于上月
需求分析
也就是计算出本月得分后和上个月的得分进行比较,当本月得分小于上个月得分,则用上个月得分替代。可怎么实现需求呢?Hive根本不支持行级别的更改数据啊…
经过冥思苦想,感觉得换一个思路了,反正我就是要取两个值的最大值嘛,那不就是开窗函数就能搞定的吗?
具体实现
object Test{
def main(args: Array[String]): Unit = {
Logger.getLogger("org").setLevel(Level.ERROR)
val spark = SparkSession
.builder()
.appName(this.getClass.getSimpleName)
.master("local[*]")
.getOrCreate()
import spark.implicits._
spark.read.textFile("./data/comp")
.map(_.split(","))
.map(x => (x(0), x(1)))
.toDF("id", "score")
.createTempView(viewName = "view")
import spark.sql
sql(sqlText = "select id,score from (select id,score,row_number() " +
"over(partition by id order by score desc) as rn from view) as t where t.rn = 1")
.createTempView(viewName = "view1")
sql(sqlText = "select id,score,'201910' as date from view1").show()
spark.stop()
}
}
结果
+---+-----+------+
| id|score| date|
+---+-----+------+
| 1| 50|201910|
| 2| 62|201910|
+---+-----+------+