初探 Kafka Streams 之一：运行 Kafka Streams 示例程序

最新推荐文章于 2023-08-21 17:43:31 发布

知行，自省

最新推荐文章于 2023-08-21 17:43:31 发布

阅读量761

点赞数

分类专栏： # Kafka 文章标签： kafka 分布式大数据

本文链接：https://blog.csdn.net/u013098162/article/details/106720705

版权

Kafka 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

kafka Streams 是一个客户端库，使用存储在 Kafka 集群中的数据作为输入、输出，用于构建任务关键的实时应用和微服务。在客户端侧，Kafka Streams 结合编写和部署标准 Java 和 Scala 应用的简易性与 Kafka 服务端集群技术的优势，使这些应用具有高度的可伸缩性、弹性、容错性、分布式等优点。

下面使用 WordCountDemo 作为示例代码，演示如何使用 Kafka Streams 库运行一个流式应用。WordCountDemo 示例代码的要点（转换为使用 Java 8 lambda 表达式以便阅读）：

// Serializers/deserializers (serde) for String and Long types
final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();
 
// Construct a `KStream` from the input topic "streams-plaintext-input", where message values
// represent lines of text (for the sake of this example, we ignore whatever may be stored
// in the message keys).
KStream<String, String> textLines = builder.stream(
      "streams-plaintext-input",
      Consumed.with(stringSerde, stringSerde)
    );
 
KTable<String, Long> wordCounts = textLines
    // Split each text line, by whitespace, into words.
    .flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
 
    // Group the text words as message keys
    .groupBy((key, value) -> value)
 
    // Count the occurrences of each word (message key).
    .count();
 
// Store the running counts as a changelog stream to the output topic.
wordCounts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));

一、下载与安装 Kafka

kafka_2.12-2.5.0 下载地址：https://www.apache.org/dyn/closer.cgi?path=/kafka/2.5.0/kafka_2.12-2.5.0.tgz

tar -xzvf kafka_2.12-2.5.0.tgz
cd kafka_2.12-2.5.0/

二、启动 Kafka 服务器

由于 Kafka 使用了 ZooKeeper，需要先启动 ZooKeeper 服务器。Kafka 自带了一个单节点的 ZooKeeper，可以使用脚本启动它：

bin/zookeeper-server-start.sh config/zookeeper.properties

00-启动单节点ZooKeeper

启动 Kafka 服务器：

bin/kafka-server-start.sh config/server.properties

01-启动Kafka

三、准备输入主题和启动 Kafka 生产者

创建输入主题：streams-plaintext-input

bin/kafka-topics.sh --create \
    --bootstrap-server localhost:9092 \
    --replication-factor 1 \
    --partitions 1 \
    --topic streams-plaintext-input

02-创建输入主题

由于输出流是一个 changelog 流，启用压缩，创建输出主题：streams-wordcount-output

bin/kafka-topics.sh --create \
    --bootstrap-server localhost:9092 \
    --replication-factor 1 \
    --partitions 1 \
    --topic streams-wordcount-output \
    --config cleanup.policy=compact

03-创建输出主题

使用 kafka-topics 工具，查看已创建主题的描述：

bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe

04-查看主题描述

四、启动 Wordcount 应用程序

示例程序将从输入主题 streams-plaintext-input 中读取消息，在执行 WordCount 算法计算之后，将结果连续写入输出主题 streams-wordcount-output，因此除日志项之外，不会有任何 STDOUT 输出。

bin/kafka-run-class.sh org.apache.kafka.streams.examples.wordcount.WordCountDemo

05-运行WordCountDemo示例

启动控制台生产者，向输入主题中写入数据：

bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic streams-plaintext-input

06-终端生产者

使用控制台消费者，从输出主题中读取结果：

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
    --topic streams-wordcount-output \
    --from-beginning \
    --formatter kafka.tools.DefaultMessageFormatter \
    --property print.key=true \
    --property print.value=true \
    --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
    --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer

07-终端消费者

第一列是 Kafka 消息的键，格式为 java.lang.String，代表被计数的词; 第二列为消息的值，格式为 java.lang.Long，代表词的个数。

五、处理数据

使用控制台生产者，向输入主题 streams-plaintext-input 写入一些消息，键入一行文本，并以 RETURN 结束。这将向输入主题发送一条消息，其中消息键为空，消息值为键入的字符串编码文本行（实际上，应用程序的输入数据通常将连续地流入到 Kafka 中）。

键入：all streams lead to kafka

继续键入：hello kafka streams

再键入：join kafka summit

08-终端生产者

09-终端消费者

下面的两个图表说明了在该场景中幕后发生的事情。Kafka Streams 在这里所做的是利用表和 changelog 流之间的二元性（table = KTable, changelog stream = the downstream KStream）：可以将表的每个更改发布到一个流，如果从头到尾使用整个 changelog 流，则可以重建表的内容。