kafka 1.0 中文文档（十）：kafka streaming

最新推荐文章于 2022-09-19 19:04:30 发布

小南瓜瓜

最新推荐文章于 2022-09-19 19:04:30 发布

阅读量1.1k

点赞数

分类专栏： kafka 文章标签： kafka kafka-消息发布与订阅

本文链接：https://blog.csdn.net/memoordit/article/details/79131941

版权

本文档介绍了如何使用Kafka Streams运行WordCount Demo应用程序。内容包括创建输入输出主题，启动WordCount应用，处理数据，以及流处理程序编程的基础知识。读者将学习如何在Kafka中构建实时流处理应用程序，实现对无限数据流的WordCount计算。

摘要由CSDN通过智能技术生成

1运行demo
2流处理程序编程指南

9.1运行demo

Kafka Streams是用于构建关键任务实时应用程序和微服务的客户端库，输入和（或）输出数据存储在Kafka集群中。 Kafka Streams结合了在客户端编写和部署标准Java和Scala应用程序的简单性以及Kafka服务器端集群技术的优势，使这些应用程序具有高度可伸缩性，弹性，容错性，分布式等特性。本快速入门示例将演示如何运行在此库中编码的流式应用程序。

这里是WordCountDemo示例代码的要点（转换为使用Java 8 lambda表达式以方便阅读）

// 用于String和Long类型的序列化器/反序列化器（serde）
final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();

// 从输入主题“streams-plaintext-input”构造一个“KStream”，
//其中消息值代表文本行（为了这个例子，我们忽略可能存储在消息键中的任何东西）。
KStream<String, String> textLines = builder.stream("streams-plaintext-input",
    Consumed.with(stringSerde, stringSerde); 
KTable<String, Long> wordCounts = textLines
    // 将每个文本行按空格拆分为单词。
    .flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))

    // 将分本单词作为消息key分组
    .groupBy((key, value) -> value)

    // 统计每个单词（消息键）的出现次数。
    .count()

//将运行计数作为更改日志流存储到输出主题。
//wordCounts.to(stringSerde, longSerde, "streams-wordcount-output");
wordCounts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));

它实现了WordCount算法，该算法从输入的文本计算出一个词出现次数。但是，与其他可能在之前看到的对静态数据进行操作的WordCount示例不同，WordCount demo应用程序的做法稍有不同，因为它被设计为在无限的无限数据流上运行。与静态变体类似，它是一种有状态的算法，用于跟踪和更新单词的计数。但是，由于它必须承担可能无限的输入数据，所以它将周期性地输出其当前状态和结果，同时继续处理更多数据，因为它不知道何时处理了“全部”输入数据。

假设已经启动了kafka和zookeeper

1：准备输入的主题并启动Kafka生产者

接下来，我们创建名为streams-plaintext-input的输入主题和名为streams-wordcount-output的输出主题：

>bin/kafka-topics.sh –create \
–zookeeper localhost:2181 \
–replication-factor 1 \
–partitions 1 \
–topic streams-plaintext-input

Created topic "streams-plaintext-input".

>bin/kafka-topics.sh –create \
–zookeeper localhost:2181 \
–replication-factor 1 \
–partitions 1 \
–topic streams-wordcount-output
–config cleanup.policy=compact

Created topic "streams-wordcount-output".

创建的主题可以用相同的kafka主题工具来查看：

>bin/kafka-topics.sh –zookeeper localhost:2181 –describe

Topic:streams-plaintext-input   PartitionCount:1    ReplicationFactor:1 Configs:
    Topic: streams-plaintext-input  Partition: 0    Leader: 0   Replicas: 0 Isr: 0
Topic:streams-wordcount-output  PartitionCount:1    ReplicationFactor:1 Configs:
    Topic: streams-wordcount-output Partition: 0    Leader: 0   Replicas: 0 Isr: 0

2、启动Wordcount应用程序

以下命令启动WordCount演示应用程序：

> bin/kafka-run-class.sh org.apache.kafka.streams.examples.wordcount.WordCountDemo

演示应用程序将从输入主题stream-plaintext-input中读取，对每个读取的消息执行WordCount算法的计算，并将其当前结果连续写入输出主题流-wordcount-output。因此，除了日志条目外，不会有任何STDOUT输出，因为结果会写回到Kafka中。

现在我们可以在一个终端的控制台启动生产者来为这个主题写入一些输入数据：

> bin/kafka-console-producer.sh –broker-list localhost:9092 –topic streams-plaintext-input

并通过在单独的终端中使用控制台客户端读取其输出主题来检查WordCount演示应用程序的输出：

>bin/kafka-console-consumer.sh –bootstrap-server localhost:9092 \
–topic streams-wordcount-output \
–from-beginning \
–formatter kafka.tools.DefaultMessageFormatter \
–property print.key=true \
–property print.value=true \
–property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
–property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer

3、处理一些数据

现在我们通过输入一行文本，然后按下，将控制台生产者的一些消息写入输入主题streams-plaintext-input。这将向输入主题发送一个新消息，消息key为空，消息值为刚刚输入的字符串编码文本行（实际上，应用程序的输入数据通常会连续流入Kafka，而不是手动输入，就像我们在这个快速入门一样）：

>bin/kafka-console-producer.sh –broker-list localhost:9092 –topic streams-plaintext-input

all streams lead to kafka

此消息将由Wordcount应用程序处理，以下输出数据将被写入到streams-wordcount-output主题中并由控制台消费者打印：

>bin/kafka-console-consumer.sh –bootstrap-server localhost:9092
–topic streams-wordcount-output \
–from-beginning \
–formatter kafka.tools.DefaultMessageFormatter \
–property print.key=true \
–property print.value=true \
–property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
–property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer

all     1
streams 1
lead    1
to      1
kafka   1

这里，第一列是java.lang.String格式的Kafka消息键，表示正在计数的单词，第二列是java.lang.Longformat中的消息值，表示单词的最新计数。

现在让我们继续用控制台生产者写入一个更多的消息到输入主题streams-plaintext-input中。

>bin/kafka-console-producer.sh –broker-list localhost:9092 –topic streams-plaintext-input

all streams lead to kafka
hello kafka streams

最低0.47元/天解锁文章