电商推荐系统五：实时系统联调

最新推荐文章于 2021-05-28 00:01:21 发布

大数据面壁者

最新推荐文章于 2021-05-28 00:01:21 发布

阅读量600

点赞数 2

分类专栏：推荐系统文章标签：推荐系统大数据

本文链接：https://blog.csdn.net/weixin_42796403/article/details/113731054

版权

推荐系统专栏收录该内容

10 篇文章 9 订阅

订阅专栏

5.4 实时系统联调

我们的系统实时推荐的数据流向是：业务系统 -> 日志 -> flume 日志采集 -> kafka streaming数据清洗和预处理 -> spark streaming 流式计算。在我们完成实时推荐服务的代码后，应该与其它工具进行联调测试，确保系统正常运行。

5.4.1 启动实时系统的基本组件

启动实时推荐系统StreamingRecommender以及mongodb、redis

5.4.2 启动zookeeper

bin/zkServer.sh start

5.4.3 启动kafka

bin/kafka-server-start.sh -daemon ./config/server.properties

5.4.4 构建Kafka Streaming程序

在recommender下新建module，KafkaStreaming，主要用来做日志数据的预处理，过滤出需要的内容。pom.xml文件需要引入依赖：

<dependencies>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-streams</artifactId>
        <version>0.10.2.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>0.10.2.1</version>
    </dependency>
</dependencies>

<build>
    <finalName>kafkastream</finalName>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
                <archive>
                    <manifest>
                        <mainClass>com.atguigu.kafkastream.Application</mainClass>
                    </manifest>
                </archive>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

在src/main/java下新建java类com.recom.kafkastreaming.Application

public class Application {
    public static void main(String[] args){

        String brokers = "localhost:9092";
        String zookeepers = "localhost:2181";

        // 定义输入和输出的topic
        String from = "log";
        String to = "recommender";

        // 定义kafka streaming的配置
        Properties settings = new Properties();
        settings.put(StreamsConfig.APPLICATION_ID_CONFIG, "logFilter");
        settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
        settings.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, zookeepers);

        StreamsConfig config = new StreamsConfig(settings);

        // 拓扑建构器
        TopologyBuilder builder = new TopologyBuilder();

        // 定义流处理的拓扑结构
        builder.addSource("SOURCE", from)
                .addProcessor("PROCESS", () -> new LogProcessor(), "SOURCE")
                .addSink("SINK", to, "PROCESS");

        KafkaStreams streams = new KafkaStreams(builder, config);
        streams.start();
    }
}

这个程序会将topic为“log”的信息流获取来做处理，并以“recommender”为新的topic转发出去。

流处理程序 LogProcess.java

public class LogProcessor implements Processor<byte[],byte[]> {
    private ProcessorContext context;

    public void init(ProcessorContext context) {
        this.context = context;
    }

    public void process(byte[] dummy, byte[] line) {
        String input = new String(line);
        // 根据前缀过滤日志信息，提取后面的内容
        if(input.contains("PRODUCT_RATING_PREFIX:")){
            System.out.println("product rating coming!!!!" + input);
            input = input.split("PRODUCT_RATING_PREFIX:")[1].trim();
            context.forward("logProcessor".getBytes(), input.getBytes());
        }
    }
    public void punctuate(long timestamp) {
    }
    public void close() {
    }
}

完成代码后，启动Application。

5.4.5 配置并启动flume

在flume的conf目录下新建log-kafka.properties，对flume连接kafka做配置：

agent.sources = exectail
agent.channels = memoryChannel
agent.sinks = kafkasink

# For each one of the sources, the type is defined
agent.sources.exectail.type = exec
# 下面这个路径是需要收集日志的绝对路径，改为自己的日志目录
agent.sources.exectail.command = tail –f
/mnt/d/Projects/BigData/ECommerceRecommenderSystem/businessServer/src/main/log/agent.log
agent.sources.exectail.interceptors=i1
agent.sources.exectail.interceptors.i1.type=regex_filter
# 定义日志过滤前缀的正则
agent.sources.exectail.interceptors.i1.regex=.+PRODUCT_RATING_PREFIX.+
# The channel can be defined as follows.
agent.sources.exectail.channels = memoryChannel

# Each sink's type must be defined
agent.sinks.kafkasink.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.kafkasink.kafka.topic = log
agent.sinks.kafkasink.kafka.bootstrap.servers = localhost:9092
agent.sinks.kafkasink.kafka.producer.acks = 1
agent.sinks.kafkasink.kafka.flumeBatchSize = 20

#Specify the channel the sink should use
agent.sinks.kafkasink.channel = memoryChannel

# Each channel's type is defined.
agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 10000

具体路径及主机名需要根据自己的按情况来编写。

配置好后，启动flume：

./bin/flume-ng agent -c ./conf/ -f ./conf/log-kafka.properties -n agent -Dflume.root.logger=INFO,console

5.4.6 启动业务系统后台

将业务代码加入系统中。注意在src/main/resources/ 下的 log4j.properties中，log4j.appender.file.File的值应该替换为自己的日志目录，与flume中的配置应该相同。
启动业务系统后台，访问localhost:8088/index.html；点击某个商品进行评分，查看实时推荐列表是否会发生变化。

有需要的话，私信我，业务代码为一个前端+后端项目，代码比较繁多。