spark从入门到放弃五十二:Spark Streaming(12)结合spark Sql

最新推荐文章于 2021-06-24 09:49:52 发布

WQ同学

最新推荐文章于 2021-06-24 09:49:52 发布

阅读量3.1k

点赞数

分类专栏： spark-streaming spark 文章标签： spark

本文链接：https://blog.csdn.net/u012957549/article/details/80218529

版权

spark 同时被 2 个专栏收录

122 篇文章 15 订阅

订阅专栏

spark-streaming

12 篇文章 1 订阅

订阅专栏

文章地址：http://www.haha174.top/article/details/253627
1.简介

Spark Streaming 强大的地方在于，可以于spark core 和spark sql 整合使用，之前已经通过transform foreachRDD 等算子看到了如何将DStream 种的RDD 使用spark core 执行批处理操作。现在就来看看如何将spark sql 和spark Streaming 整合起来操作

2.案例

每隔10秒，统计最近60秒的，每个种类的每个商品的点击次数，然后统计出每个种类top3 热门的商品
下面给出

public class Top3HotProduct {
    public static void main(String[] args) throws InterruptedException {
        SparkConf conf=new SparkConf().setAppName("Top3HotProduct").setMaster("local[2]");
        JavaStreamingContext jssc=new JavaStreamingContext(conf, Durations.seconds(1));
        //首先看一下，输入日志的格式
        //leo  product1 category1
        //首先获取输入数据
        JavaReceiverInputDStream<String> lines=jssc.socketTextStream("www.codeguoj.cn",9999);
        JavaPairDStream<String,Integer> categoryProductDStream=lines.mapToPair(new PairFunction<String, String, Integer>() {
            @Override
            public Tuple2<String, Integer> call(String s) throws Exception {
                String[] prudoctSplited=s.split(" ");

                return new Tuple2<>(prudoctSplited[2]+"-"+prudoctSplited[1],1);
            }
        });
        //然后执行window
        //到这里，就可以做到，每隔10秒钟，对最近60秒的数据，执行reduceByKey  操作
        //计算出来这60秒内，每个种类的每个商品的点击次数
        JavaPairDStream<String,Integer> categoryProductDStreamed=categoryProductDStream.reduceByKeyAndWindow(new Function2<Integer, Integer, Integer>() {
            @Override
            public Integer call(Integer v1, Integer v2) throws Exception {
                return v1+v2;
            }
        },Durations.seconds(60),Durations.seconds(10));
        //然后针对60秒内的每个种类的每个商品的点击次数
        categoryProductDStreamed.foreachRDD(new VoidFunction<JavaPairRDD<String, Integer>>() {
            @Override
            public void call(JavaPairRDD<String, Integer> stringIntegerJavaPairRDD) throws Exception {
                JavaRDD<Row> rowCategoryCount=    stringIntegerJavaPairRDD.map(new Function<Tuple2<String, Integer>, Row>() {
                    @Override
                    public Row call(Tuple2<String, Integer> v1) throws Exception {
                        String category=v1._1.split("-")[0];
                        String product=v1._1.split("-")[1];
                        int count=v1._2;
                        return RowFactory.create(category,product,count);
                    }
                });
                //DataSet  转换
                List<StructField> structFields=new ArrayList<>();
                structFields.add(DataTypes.createStructField("category",DataTypes.StringType,true));
                structFields.add(DataTypes.createStructField("product",DataTypes.StringType,true));
                structFields.add(DataTypes.createStructField("click_count",DataTypes.IntegerType,true));
                StructType structType=DataTypes.createStructType(structFields);
                SQLContext sqlContext=new SQLContext(rowCategoryCount.context());
                Dataset cataCountDS=sqlContext.createDataFrame(rowCategoryCount,structType);
                //  将60秒内的数据创建一个零时表
                cataCountDS.registerTempTable("product_click_log");
                Dataset cataSearchDS=   sqlContext.sql(
                        "SELECT category,product,click_count "
                                + "FROM ("
                                + "SELECT "
                                + "category,"
                                + "product,"
                                + "click_count,"
                                + "row_number() OVER (PARTITION BY category ORDER BY click_count DESC) rank "
                                + "FROM product_click_log"
                                + ") tmp "
                                + "WHERE rank<=3");
                cataSearchDS.show();
            }
        });
        jssc.start();
        jssc.awaitTermination();
        jssc.stop();
        jssc.close();
    }
}