做什么?
通过Spark Streaming的窗口操作(reduceByKeyAndWindow)实现统计一个小时内每个广告每分钟的点击量
需求解析:
步骤解析
1.转化key为dateTime_adid
//1.转化key为dateTime_adid
val key2TimeMinute=adRealTimeFilterDstream.map{
case(log)=>{
val logSplit = log.split(" ")
val timeStamp = logSplit(0).toLong
// yyyyMMddHHmm
val timeMinute = DateUtils.formatTimeMinute(new Date(timeStamp))
val adid = logSplit(4).toLong
val key = timeMinute + "_" + adid
(key, 1L)
}
}
2.window
需要注意的是
- 窗口的大小必须为kafka发送数据频率的倍数
- 有可能设置一小时没有数据展示,可以改为secods(10),seconds(5)
val windowKey2=key2TimeMinute.reduceByKeyAndWindow((a:Long, b:Long)=>(a+b), Minutes(60), Minutes(1));
3.封装入库
windowKey2.foreachRDD{
rdd => rdd.foreachPartition{
// (key, count)
items=>
val trendArray = new ArrayBuffer[AdClickTrend]()
for((key, count) <- items){
val keySplit = key.split("_")
// yyyyMMddHHmm
val timeMinute = keySplit(0)
val date = timeMinute.substring(0, 8)
val hour = timeMinute.substring(8,10)
val minute = timeMinute.substring(10)
val adid = keySplit(1).toLong
trendArray += AdClickTrend(date, hour, minute, adid, count)
}
trendArray.foreach(println);
//AdClickTrendDAO.updateBatch(trendArray.toArray)
}
}