Spark SQL+Spark Streaming案例

最新推荐文章于 2024-07-08 07:15:00 发布

霄嵩

最新推荐文章于 2024-07-08 07:15:00 发布

阅读量2.1k

点赞数

分类专栏： Spark Streaming 文章标签： spark

本文链接：https://blog.csdn.net/accptanggang/article/details/53117113

版权

Spark Streaming 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

package SparkStreaming

import org.apache.spark.SparkConf
import org.apache.spark.sql.{Row, SQLContext}
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
* Created by tg on 11/9/16.
* 每隔10秒，统计最近60秒的，每个种类的每个商品的点击次数，
* 然后统计出点击次数top3的商品。
*/
object Top3Demo {
def main(args: Array[String]): Unit = {
val conf=new SparkConf().setAppName("Top3Demo")
.setMaster("local[2]")
val ssc=new StreamingContext(conf,Seconds(5))

/**
* 从nc服务中获取商品点击的日志数据,格式如下：
* 用户点击日志格式:用空格间隔
用户商品种类
张三 iphone mobile
李四 vivo mobile
*/
val linesDStream=ssc.socketTextStream("tgmaster",9999)

/**
* 对数据进行操作,形成(category_product,1)这种形式
*/
val pairDStream=linesDStream.map(line=>{
val logInfo=line.split(" ")
val product=logInfo(1).trim
val category=logInfo(2).trim
(category+"-"+product,1) ///注意格式
})

/**
* 每隔10秒，统计最近60秒的，每个种类的每个商品的点击次数
*/
val countDStream=pairDStream.reduceByKeyAndWindow((v1:Int,v2:Int)=>
v1+v2,Seconds(60),Seconds(10))

/**
* 统计出每个种类top3热门的商品
*/
countDStream.foreachRDD(itemRDD=>{
/**
* 通过map算子，将category\product\clickcount
* 操作的数据格式是： (category-product,clickcount)
* 通过动态编程的方式将RDD转换为DataFrame
* 下面是产生一个Row类型的RDD
*/
val rowRDD=itemRDD.map(item=>{
val proInfo=item._1.split("-")
val category=proInfo(0)
val product=proInfo(1)
val clickcount=item._2
Row(category,product,clickcount)
})
/**
* 产生商品的数据结构
*/
val structType=StructType(Array(
StructField("category",StringType,true),
StructField("product",StringType,true),
StructField("clickcount",IntegerType,true)
))
//创建SQLContext对象
val sqlContext=new SQLContext(itemRDD.context)
//创建DataFrame
val df=sqlContext.createDataFrame(rowRDD,structType)
// df.show()

//将DataFrame创建临时表proinfo
df.registerTempTable("proinfo")

val top3info=sqlContext.sql("select * from proinfo order by clickcount desc limit 3")
top3info.show()

})

countDStream.print()
ssc.start()
ssc.awaitTermination()
}
}

霄嵩

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Spark SQL+Spark Streaming案例

package SparkStreamingimport org.apache.spark.SparkConfimport org.apache.spark.sql.{Row, SQLContext}import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}import
复制链接

扫一扫

专栏目录