实时分析:Flume+Kafka+SparkStreaming商品评分排行榜

写在前面

接上一篇《Flume+Kafka数据采集与清洗》

本文主要介绍用流计算SparkStreaming对数据进行实时处理。

流计算

概念

  • 实时获取来自不同数据源的海量数据,经过实时 分析处理,获得有价值的信息

  • 流计算秉承一个基本理念,即数据的价值随着时间的流逝而降低, 如用户点击流。因此,当事件出现时就应该立即进行处理,而不是缓 存起来进行批量处理。为了及时处理流数据,就需要一个低延迟、可 扩展、高可靠的处理引擎

  • 对于一个流计算系统来说,它应达到如下需求:

    • 高性能
    • 海量式
    • 实时性
    • 分布式
    • 易用性
    • 可靠性

SparkStreaming

Spark Streaming可整合多种输入数据源,如Kafka、Flume、 HDFS,甚至是普通的TCP套接字。经处理后的数据可存储至文件 系统、数据库,或显示在仪表盘里。

Spark Streaming的基本原理是将实时输入数据流以时间片(秒级)为单 位进行拆分,然后经Spark引擎以类似批处理的方式处理每个时间片数据。

Spark Streaming执行流程

Spark Streaming最主要的抽象是DStream(Discretized Stream,离散化数据流),表示连续不断地数据流。在内部实现上,SparkStreaming的输入数据按照时间片(如1秒)分成一段一段的DStream,每一段数据转换为Spark中的RDD,并且对DStream的操作都最终转变为对应的RDD操作。

DStream操作示意图

功能说明

image-20210429092702119

关于数据的采集与清洗流程在《Flume+Kafka数据采集与清洗》里已经讲过了,这里不再赘述。

在此增加一项对推荐服务的流程描述:

1、用户打开商品实时评分排行榜时,与商品服务建立一个websocket连接。

2、推荐服务经过实时算法计算过后,将结果通过rating主题写入到kafka中。

3、商品服务消费Kafka中主题为rating的消息,并将结果通过websocket发送到用户浏览器端。

4、页面在通过websocket接收到消息后,渲染成需要展示的数据样式。

功能开发

BusinessServer

商品服务,SpringBoot+Maven结构

在依赖中引入websocket与kafka-clients:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-websocket</artifactId>
</dependency>
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>2.8.0</version>
</dependency>

websocket配置:

package cn.javayuli.businessserver.config;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.socket.server.standard.ServerEndpointExporter;

/**
 * websocket config
 *
 * @author hanguilin
 */
@Configuration
public class WebSocketConfig {

    /**
     * 注入一个ServerEndpointExporter,该Bean会自动注册使用@ServerEndpoint注解申明的websocket endpoint
     */
    @Bean
    public ServerEndpointExporter serverEndpointExporter () {
        return new ServerEndpointExporter();
    }
}

websocker接口处理类:

package cn.javayuli.businessserver.websocket;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;

import javax.websocket.*;
import javax.websocket.server.ServerEndpoint;
import java.io.IOException;
import java.util.Optional;
import java.util.concurrent.CopyOnWriteArraySet;

/**
 * websocket服务
 *
 * @author hanguilin
 */
@Component
@ServerEndpoint("/endpoint/business")
public class SocketHandler {

    private static CopyOnWriteArraySet<Session> sessionSet = new CopyOnWriteArraySet<>();

    private static final Logger LOGGER = LoggerFactory.getLogger(SocketHandler.class);

    @OnOpen
    public void onOpen(Session session) {
        sessionSet.add(session);
    }

    @OnMessage
    public void onMessage(String message, Session session) {

    }

    @OnClose
    public void onClose(Session session) {
        sessionSet.remove(session);
    }

    @OnError
    public void onError(Session session, Throwable throwable) {
        sessionSet.remove(session);
        LOGGER.error("服务端发生错误: {}", throwable);
    }

    /**
     * 发送消息给所有连接
     *
     * @param message 消息
     */
    public static void sendMessage (String message) {
        sessionSet.forEach(session -> {
            try {
                session.getBasicRemote().sendText(message);
            } catch (IOException e) {
                LOGGER.error("服务端发送消息失败: {}", e);
            }
        });
    }

    /**
     * 发送消息给指定连接
     *
     * @param message 消息
     * @param sessionId session的id
     */
    public static void sendMessage (String message, String sessionId) {
        Optional<Session> first = sessionSet.stream().filter(o -> sessionId.equals(o.getId())).findFirst();
        first.ifPresent(session -> {
            try {
                session.getBasicRemote().sendText(message);
            } catch (IOException e) {
                LOGGER.error("服务端发送消息失败: {}", e);
            }
        });
    }

}

Kafka消费者:

package cn.javayuli.businessserver.runner;

import cn.javayuli.businessserver.websocket.SocketHandler;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;

import java.time.Duration;
import java.util.Arrays;
import java.util.Properties;

/**
 * kafka消费者
 *
 * @author hanguilin
 */
@Component
public class KafkaRatingConsumer implements CommandLineRunner {

    private static final Properties properties = new Properties();

    static {
        properties.put("bootstrap.servers", "192.168.1.43:9092");
        properties.put("group.id", "group-1");
        properties.put("enable.auto.commit", "true");
        properties.put("auto.commit.interval.ms", "1000");
        properties.put("auto.offset.reset", "earliest");
        properties.put("session.timeout.ms", "30000");
        properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
    }

    @Override
    public void run(String... args) {
        KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(properties);
        // 订阅rating主题
        kafkaConsumer.subscribe(Arrays.asList("rating"));
        while (true) {
            ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofMillis(100));
            for (ConsumerRecord<String, String> record : records) {
                // 发送websocket消息
                SocketHandler.sendMessage(record.value());
            }
        }
    }
}

此类实现了CommandLineRunner接口,是为了让该类在启动SpringBoot项目时执行监听kafka的rating主题。

StreamingRecommender

商品推荐服务,java + maven结构

实时计算处理类:

package cn.javayuli.streamrecommender.streaming;

import cn.hutool.json.JSONUtil;
import com.google.common.collect.Lists;
import com.google.common.collect.Maps;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.kafka.common.serialization.StringSerializer;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.Optional;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.VoidFunction;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaInputDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.kafka010.ConsumerStrategies;
import org.apache.spark.streaming.kafka010.KafkaUtils;
import org.apache.spark.streaming.kafka010.LocationStrategies;
import scala.Tuple2;

import java.math.BigDecimal;
import java.util.*;

/**
 * 实时评分最高的商品(实时评分榜)
 *
 * @author hanguilin
 */
public class RealTimeTopRate {

    public static final Map<String, Object> kafkaParams;

    public static KafkaProducer<String, Object> stringObjectKafkaProducer;

    static {
        Map<String, Object> temp = Maps.newHashMap();
        temp.put("bootstrap.servers", "192.168.1.43:9092");
        temp.put("key.deserializer", StringDeserializer.class);
        temp.put("value.deserializer", StringDeserializer.class);
        temp.put("key.serializer", StringSerializer.class);
        temp.put("value.serializer", StringSerializer.class);
        temp.put("group.id", "wordGroup");
        temp.put("auto.offset.reset", "latest");
        temp.put("enable.auto.commit", false);
        kafkaParams = Collections.unmodifiableMap(temp);

        stringObjectKafkaProducer = new KafkaProducer<>(kafkaParams);
    }

    public static void main(String[] args) throws InterruptedException {
        SparkConf sparkConf = new SparkConf()
                .setMaster("local[*]")
                .setAppName("computeTopRate");
        JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
        javaSparkContext.setLogLevel("ERROR");
        javaSparkContext.setCheckpointDir("./checkpoint");
        JavaStreamingContext javaStreamingContext = new JavaStreamingContext(javaSparkContext, Durations.milliseconds(500));

        // 订阅kafka主题
        List<String> topics = Arrays.asList("recommender");

        JavaInputDStream<ConsumerRecord<String, String>> stream = KafkaUtils.createDirectStream(javaStreamingContext, LocationStrategies.PreferConsistent(), ConsumerStrategies.Subscribe(topics, kafkaParams));
        JavaPairDStream<String, BigDecimal> counts = stream.flatMap(o -> {
                    // 获取kafka消息,并按照|分割  ex: "PRODUCT_RATING_PREFIX:han|奶茶|5.2|1619514025"
                    String[] split = o.value().split("\\|");
                    // 将分割后的数组中的下标为1和下标为2的元素的值按照|连接 ex: "奶茶|5.2"
                    return Lists.newArrayList(split[1] + "|" + split[2]).iterator();
                })
                .mapToPair(o -> {
                    // ex: "奶茶|5.2"
                    String[] split = o.split("\\|");
                    return new Tuple2<>(split[0], new BigDecimal(split[1]));
                })
                .reduceByKey(BigDecimal::add);

        JavaPairDStream<String, BigDecimal> result = counts
                .updateStateByKey(new Function2<List<BigDecimal>, Optional<BigDecimal>, Optional<BigDecimal>>() {

                    private static final long serialVersionUID = 1L;

                    /**
                     * 处理函数
                     *
                     * @param values 经过分组最后 这个key所对应的value
                     * @param state 这个key在本次之前之前的值
                     * @return 处理后的value值
                     */
                    @Override
                    public Optional<BigDecimal> call(List<BigDecimal> values,
                                                  Optional<BigDecimal> state) {

                        BigDecimal updateValue = BigDecimal.ZERO;
                        // 如果原来有值,则先获取原来的值
                        if (state.isPresent()) {
                            updateValue = state.get();
                        }
                        // 加上新分组后的value值
                        for (BigDecimal value : values) {
                            updateValue = updateValue.add(value);
                        }
                        // 给当前key返回新的value值
                        return Optional.of(updateValue);
                    }
                });
        // 发送到kafka的rating主题
        JavaDStream<String> resultDStream = result.map((Function<Tuple2<String, BigDecimal>, String>) stringBigDecimalTuple2 -> String.format("%s,%s", stringBigDecimalTuple2._1(), stringBigDecimalTuple2._2()));
        resultDStream.foreachRDD((VoidFunction<JavaRDD<String>>) stringRDD -> {
            stringObjectKafkaProducer.send(new ProducerRecord<>("rating", "data", JSONUtil.toJsonStr(stringRDD.collect())));
        });
        result.print();

        javaStreamingContext.start();
        javaStreamingContext.awaitTermination();
    }
}

StreamingRecommenderWeb

Web单页面

商品实时排行榜页面:

<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>实时评分排行榜</title>
    <script src="https://cdn.bootcdn.net/ajax/libs/jquery/3.6.0/jquery.min.js"></script>
    <script src="https://cdn.bootcdn.net/ajax/libs/moment.js/2.29.1/moment.min.js"></script>
</head>
<body>

    <div class="all">
        <h1>实时评分排行榜</h1>
        <span class="refresh-time">刷新时间:<span class="time-text"></span></span>
        <div class="product-table">
            <table>
                <thead>
                    <tr>
                        <th>商品名称</th>
                        <th>评分合计</th>
                    </tr>
                </thead>
                <tbody>

                </tbody>
            </table>
        </div>
    </div>
    
    <script>
        let champion = '<svg t="1619600325685" class="icon" viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" p-id="3793" width="25" height="25" fill="red"><path d="M657.212778 692.010817c198.527542-34.618449 302.065663-272.621051 302.065663-485.913878-63.067384 0-134.414339 0-162.91751 0 9.620102-68.769246 15.646352-124.139322 15.646352-143.373385-91.650373 0-506.526335 0-598.173638 0 3.317559 36.776601 7.833409 87.304392 14.016225 143.373385-28.79686 0-99.963713 0-163.856905 0 0 216.192877 104.930841 453.824019 307.677475 486.460324 32.152281 41.288358 56.08025 58.536184 83.363639 67.988464 0 19.120476 0 47.786353 0 47.786353L339.263355 808.332079l-86.837764 152.959717 520.990768 0L686.578596 808.333102 570.804802 808.333102l0-47.786353C595.832825 754.150062 628.084367 727.289298 657.212778 692.010817zM715.530998 598.039585c24.75173-57.706283 52.937676-207.106941 72.409147-334.572008 12.798491 0 32.917714 0 60.302411 0 0 179.800016-78.458933 313.974901-135.833665 341.376994C713.450616 602.567715 714.549647 600.293929 715.530998 598.039585zM175.243745 263.4686c26.479071 0 46.303582 0 59.31799 0 15.940041 128.166031 40.496319 269.923616 78.456886 341.55198C254.878456 578.178235 175.243745 443.267593 175.243745 263.4686zM399.837966 563.42216 399.837966 503.305991l75.555813 0L475.393779 271.492344l-77.660754 17.544585 0-61.98882 153.216567-32.280194L550.949592 503.305991l75.088162 0 0 60.117193L399.837966 563.423183z" p-id="3794"></path></svg>'
        let second = '<svg t="1619600129995" class="icon" viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" p-id="2733" width="25" height="25" fill="green"><path d="M657.212778 692.010817c198.527542-34.618449 302.065663-272.621051 302.065663-485.913878-63.067384 0-134.414339 0-162.91751 0 9.620102-68.769246 15.646352-124.139322 15.646352-143.373385-91.650373 0-506.526335 0-598.173638 0 3.317559 36.776601 7.833409 87.304392 14.016225 143.373385-28.79686 0-99.963713 0-163.856905 0 0 216.192877 104.930841 453.824019 307.677475 486.460324 32.152281 41.288358 56.08025 58.536184 83.363639 67.988464 0 19.120476 0 47.786353 0 47.786353L339.263355 808.332079l-86.837764 152.959717 520.990768 0L686.578596 808.333102 570.804802 808.333102l0-47.786353C595.832825 754.150062 628.084367 727.289298 657.212778 692.010817zM715.530998 598.039585c24.75173-57.706283 52.937676-207.106941 72.409147-334.572008 12.798491 0 32.917714 0 60.302411 0 0 179.800016-78.458933 313.974901-135.833665 341.376994C713.450616 602.567715 714.549647 600.293929 715.530998 598.039585zM175.243745 263.4686c26.479071 0 46.303582 0 59.31799 0 15.940041 128.166031 40.496319 269.923616 78.456886 341.55198C254.878456 578.178235 175.243745 443.267593 175.243745 263.4686zM384.166031 563.42216l0-59.415204 104.093776-99.882871c21.831214-20.895913 37.112246-38.439474 45.848211-52.631708 8.730848-14.18814 13.099343-28.925796 13.099343-44.210921 0-34.462906-18.479887-51.696406-55.438637-51.696406-31.03483 0-61.053518 12.478196-90.059132 37.427424l0-67.134003c31.030737-20.584827 66.276473-30.877241 105.731066-30.877241 35.709293 0 64.012919 9.085936 84.912925 27.251667 20.894889 18.168802 31.344892 43.315528 31.344892 75.439156 0 42.419112-25.731034 86.86437-77.193103 133.332704l-75.555813 68.070328 0 1.403976 149.473313 0 0 62.924121L384.166031 563.423183z" p-id="2734"></path></svg>'
        let third = '<svg t="1619600162215" class="icon" viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" p-id="3321" width="25" height="25" fill="blue"><path d="M657.212778 692.010817c198.527542-34.618449 302.065663-272.621051 302.065663-485.913878-63.067384 0-134.414339 0-162.91751 0 9.620102-68.769246 15.646352-124.139322 15.646352-143.373385-91.650373 0-506.526335 0-598.173638 0 3.317559 36.776601 7.833409 87.304392 14.016225 143.373385-28.79686 0-99.963713 0-163.856905 0 0 216.192877 104.930841 453.824019 307.677475 486.460324 32.152281 41.288358 56.08025 58.536184 83.363639 67.988464 0 19.120476 0 47.786353 0 47.786353L339.263355 808.332079l-86.837764 152.959717 520.990768 0L686.578596 808.333102 570.804802 808.333102l0-47.786353C595.832825 754.150062 628.084367 727.289298 657.212778 692.010817zM715.530998 598.039585c24.75173-57.706283 52.937676-207.106941 72.409147-334.572008 12.798491 0 32.917714 0 60.302411 0 0 179.800016-78.458933 313.974901-135.833665 341.376994C713.450616 602.567715 714.549647 600.293929 715.530998 598.039585zM175.243745 263.4686c26.479071 0 46.303582 0 59.31799 0 15.940041 128.166031 40.496319 269.923616 78.456886 341.55198C254.878456 578.178235 175.243745 443.267593 175.243745 263.4686zM483.814566 569.738006c-37.272905 0-67.291592-6.161327-90.058109-18.479887l0-66.666352c24.170492 18.092054 52.398394 27.13501 84.678588 27.13501 20.427238 0 36.644595-4.520967 48.654117-13.566994 12.006452-9.041934 18.012236-21.907962 18.012236-38.59604 0-17.152658-7.134491-30.252001-21.403472-39.298028-14.267958-9.041934-34.268478-13.566994-59.999512-13.566994l-32.280194 0 0-58.245566 29.707602 0c48.65514 0 72.983221-16.373922 72.983221-49.122791 0-30.877241-18.637476-46.315862-55.906288-46.315862-24.327058 0-47.953152 7.953136-70.877257 23.859407L407.325497 214.416418c24.949228-12.941753 54.345746-19.415189 88.187505-19.415189 34.305317 0 62.142316 7.916297 83.507926 23.74275 21.363564 15.829524 32.04688 37.777395 32.04688 65.847707 0 47.408753-24.016996 77.193103-72.046896 89.357144l0 1.169639c25.263383 2.806929 45.457308 12.046361 60.584843 27.719318 15.124466 15.671934 22.690792 34.736129 22.690792 57.192583 0 34.466999-12.595876 61.366649-37.778418 80.70202C559.33047 560.070832 525.762957 569.738006 483.814566 569.738006z" p-id="3322"></path></svg>'
        $(function(){
            initWebsocket()
        })
        function initWebsocket () {
            if ("WebSocket" in window)
            {
               // 打开一个 web socket
               var ws = new WebSocket("ws://192.168.1.43:7001/endpoint/business");
                
               ws.onopen = function()
               {
                  // Web Socket 已连接上,使用 send() 方法发送数据
                  ws.send("发送数据");
               };
                
               ws.onmessage = function (evt) 
               { 
                  var received_msg = evt.data;
                  consumer(received_msg)
               };
                
               ws.onclose = function()
               { 
                  // 关闭 websocket
               };
            }
            
            else
            {
               // 浏览器不支持 WebSocket
               alert("您的浏览器不支持 WebSocket!");
            }
        }

        function consumer (received_msg) {
            let html = JSON.parse(received_msg).sort((x, y) => {
                let val1 = x.split(',')[1]
                let val2 = y.split(',')[1]
                return val2 -val1
            }).map((e, i) => {
                let arr = e.split(',')
                let extra = getExtra(i)
                return `<tr><td>${extra}${arr[0]}</td><td>${arr[1]}</td></tr>`
            }).join('')
            $('.product-table tbody').html(html)
            $('.time-text').html(moment().format('YYYY/MM/DD HH:mm:ss'))
        }

        function getExtra (i) {
            switch (i) {
                case 0:
                    return champion
                case 1:
                    return second
                case 2:
                    return third
                default:
                    return ''
            }
        }
    </script>
</body>
</html>

下发模拟数据页面:

<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>模拟发送请求</title>
    <script src="https://cdn.bootcdn.net/ajax/libs/jquery/3.6.0/jquery.min.js"></script>
    <script src="https://cdn.bootcdn.net/ajax/libs/moment.js/2.29.1/moment.min.js"></script>
</head>

<body>

    <div class="all">
        <button id="submit">下发数据</button>
        <div class="data-table">
            <table>
                <thead>
                    <tr>
                        <th>商品名称</th>
                        <th>下发评分</th>
                        <th>下发人</th>
                        <th>下发时间</th>
                    </tr>
                </thead>
                <tbody>

                </tbody>
            </table>
        </div>
    </div>

    <script>
        let productArr = ['新款陈数明星同款醋酸缎面桔色西装春夏百搭显瘦西服OL套装女','OVV春夏提花面料V领长袖连衣裙GQLCJ11003A','春季新款七分袖职业收腰黑色小西装西服夏季薄款白色西装外套女','风衣女士中长款2021年春秋新款洋气垂感薄款气质高端大气外套过膝','高端气质风衣外套女初春秋季2021年新款小个子短款贵夫人洋气薄款','YIMU粉色西装外套女春装毛呢双排扣人字纹炸街垫肩名媛修身上衣','自制 无敌好面料 春秋 西服白色西装外套又飒又帅出街即C位 垫肩','ONLY夏季新款简约通勤版型单排纽扣衣襟西服女','ONLY夏季新款时尚短款泡泡袖V领系带收腰衬衫女','2021新款春装职业白色西装女套装ol时尚气质小香风休闲外套工作服','2021新款清新绿色醋酸西装气质垂感阔腿裤两件套OL套装女高端洋气','高阶定制jin口重磅全真丝!女一粒扣含腰带垫肩长袖西装外套83961','《初壹》春夏新款森系盘扣女装 日常中式改良双层无袖连衣裙 女','Z老板 层次艺术感 羊毛两件套围裙西装 中长款西装领可拆卸马甲女','2018春秋新款V领修身丝绒西装女 双排扣复古极简中长款工装外套','MONA新款2021年今年爆款卫衣女设计感小众小个子春秋季薄款潮ins','chic西装外套女春季2021新款韩版一粒扣休闲网红修身小西服上衣潮','【店长推荐】Buou Buou新款夏外搭上衣防晒衣女夏季短外套DF2C002','西装短外套女士小个子早春季西服设计感小众上衣2021新款秋薄潮夏','FAYEYE SHOP自制 高品质!好版型!洋气韩版宽松休闲小西装外套女','Uslonsrd粉色西装女装薄款外套职业装2021春夏九分袖半衬亚麻上衣','小熊2021夏新款JK制服马甲日系学院风西装背心外套女TTVW216501N','PRICH2021年春季新款宽松休闲上衣日系慵懒针织开衫女PRKCB5103Q','娅丽达黑色小西装外套女2021春夏薄款上衣短款小个子白色休闲西服','秋季新款黑色职业套装气质西装商务正装面试装工作服职业装OL工装','中老年女装加大码全棉长袖格子衬衫胖妈妈装春秋薄款衬衣宽松上衣','秘密盒子可诉可可挚爱 收藏级品质高定30姆米重磅真丝西装外套女','13良品小个子 颜色细节都很用心西服150版型奶黄色斜扣西装外套女','PRICH2021年春季新款气质系带英伦风设计感风衣外套女PRJTB5102R','2021韩版宽松休闲网红小西装外套女新款春秋一粒扣西服上衣英伦风','浅秋金菊假两件套2021新款女装潮春款爆款毛衣女士上衣针织打底衫','丝绒西装外套女春秋2021上衣气质高端职业装休闲金丝绒小西服套装','【雅叙】qtu亚麻女装短西装修身春秋装新款西服长袖小外套女W1142','OFFIY治愈系SUN轻熟职业通勤双排扣西服外套时尚春款西装套装气质','陈小颖Jupiter春秋显瘦名媛OL百搭西装外套女新款设计感小众上衣','阔版小西服女2021春秋新款设计感小众外套姜黄色韩版宽松休闲西装','西装外套女2021早春新款奶白色减龄炸街高级范设计感别致西服上衣','品牌折扣店商场专柜撤柜外贸女装尾货短袖上衣设计感小众V领T恤夏','觅格经典双排扣收腰蓝色职业装套装女高级感西装外套气质女神范03','Uslonsrd亚麻西装女装薄款外套休闲职业装2021春装九分袖米色上衣','复古港味雪纺花衬衫女春款2021新款宽松设计感小众洋气长袖上衣潮','Vero Moda2021春夏新款韩版格纹通勤单扣西装外套女','Oece气质小西装2021年春装新款女装复古英伦炸街韩版休闲西服外套','粉色小西装女外套2021春秋季新款休闲韩版网红短款小个子女士西服','2021春秋季新款浅灰色西服上衣宽松休闲网格子小西装外套女韩版潮','西装+百褶短裙是正解!浅荞麦绿特殊复合丝手工捏褶女套装83929','chic小个子粉色西装外套女薄款春秋2021新款高级感西服设计感小众','西装外套女薄款设计感小众春秋2021新款复古炸街烫钻黑色西服上衣','网红西装套装女英伦风 韩版小香风春秋炸街休闲气质时尚职业西服','BBBLUE小众自制早春灰白拼色泡泡袖小西装高级感设计时髦短外套女','2021春装新款女外套提花西装短款长袖小西服韩版修身上衣一粒扣潮','CAN2021春夏新款 法式荷叶边V领衬衫设计感小众宽松显瘦短款上衣','王小鸭2021春季新款修身显瘦洋气一粒扣OL气质优雅长袖西装外套女','小西装质感女士上衣高级感炸街重磅醋酸缎面早春西装外套女春夏','西装套装女韩版职业装套裤修身面试商务正装小西服工作服红色春秋','秘密盒子[现货]雅致新色 21春夏新款法式简约黄色混搭风西装女','衣然故我小西装外套女2021新款通勤气质宽松一粒扣浅卡其色上衣','觅格双排扣精纺羊毛收腰职业套装女高级感西装外套气质女神范V65','ROCOCO2021春夏新款百搭甜美V领时髦碎花减龄显瘦全棉小衫上衣女','原创黑色奶咖色棕色复古宽松廓形落肩炸街西服英伦风西装外套女秋']
        let run = true
        let timer
        $(function () {
            $('#submit').click(function () {
                if (run) {
                    $(this).html('停止下发')
                    timer = setInterval(() => {
                        let score = randomNum(1, 10)
                        let product = getRandomProduct()
                        let user = 'han'
                        $.ajax({
                            url: 'http://192.168.1.43:7001/rate',
                            type: 'POST',
                            contentType: 'application/json',
                            data: JSON.stringify({score, user, product}),
                            success: () => {
                                $('.data-table tbody').prepend(`<tr><td>${product}</td><td>${score}</td><td>${user}</td><td>${moment().format('YYYY/MM/DD HH:mm:ss')}</td></tr>`)
                            }
                        })
                    }, 200)
                    run = false
                } else {
                    $(this).html('下发数据')
                    clearInterval(timer)
                    run = true
                }
            })
        })
        //生成从minNum到maxNum的随机数
        function randomNum(minNum, maxNum) {
            switch (arguments.length) {
                case 1:
                    return parseInt(Math.random() * minNum + 1, 10)
                case 2:
                    return parseInt(Math.random() * (maxNum - minNum + 1) + minNum, 10)
                default:
                    return 0
            }
        }

        function getRandomProduct () {
            let random = parseInt(Math.random() * productArr.length)
            return productArr[random]
        }

    </script>
</body>

</html>

项目部署

启动zookeeper

cd zookeeper
./bin/zkServer.sh start

启动kafka

cd ~/kafka
./bin/kafka-server-start.sh -daemon ./config/server.properties

启动商品服务

BusinessServer项目

打成jar上传至服务器(与flume在同一台服务器,因为需要能访问到同一个log文件)

通过java -jar BusinessServer-0.0.1.jar直接运行

启动flume

cd ~/flume
./bin/flume-ng agent -c ./conf/ -f ./conf/log-kafka.properties -n agent -Dflume.root.logger=INFO,console

启动清洗服务

KafkaStreaming项目

可以在本地启动,启动类为cn.javayuli.kafkastream.KafkaStreamApp

启动推荐服务

StreamingRecommender项目

可以在本地启动,启动类为cn.javayuli.streamrecommender.streaming.RealTimeTopRate

预览

下发模拟数据:

查看商品实时评分排行榜:

duck不必纠结两张动图时间为什么相差比较大,非程序问题。

资源地址

文中只贴出了关键性代码,全部代码请查看git仓库Recommender

  • 2
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Python爬虫是一种用于抓取网页数据的程序,它可以通过发送HTTP请求并解析HTML内容来提取所需的数据。通过使用Python库如BeautifulSoup和Scrapy,我们可以编写爬虫来自动化数据收集和提取。 Flume是一个分布式的、可靠的、可扩展的日志收集、聚合和传输系统。它可以从多个源采集实时数据,并将其传输到其他处理系统中,如Hadoop和SparkKafka是一个高吞吐量的分布式数据流平台,它允许以实时方式收集、处理和存储数据流。它被广泛用于大数据和流处理应用中,包括实时推荐、日志处理和事件驱动的架构。 Spark Streaming是Apache Spark的一个子项目,它允许在实时流数据中进行高效的流处理。Spark Streaming可以与Kafka等数据源集成,以实时的方式处理来自不同源的数据,并进行转换、分析和存储。 MySQL是一种关系型数据库管理系统,它被广泛用于存储和管理结构化数据。在上述技术栈中,MySQL可以被用作存储爬虫抓取的数据、Kafka传输的数据和Spark Streaming处理的数据。 ECharts是一种用于数据可视化的JavaScript表库,它可以将数据转化为表和形,使数据更易于理解和分析。 综上所述,以上提到的技术可以结合使用来构建一个完整的实时数据处理和可视化系统。Python爬虫用于抓取实时数据,Flume用于收集和传输数据,Kafka用于数据流处理,Spark Streaming用于实时分析,MySQL用于数据存储,最后使用ECharts将数据可视化。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值