Apache Kafka is a distributed streaming platform. Let’s explain it in more detail. Apache Kafka is three key capabilities where publish and subscribe to streams of records, similar to a message queue or enterprise messaging system. Apache Kafka provides a distributed publish-subscribe messaging system and robust queue that can handle a high volume of data and enables us to pass message consumption.
Apache Kafka是一个分布式流平台。 让我们更详细地解释它。 Apache Kafka是发布和订阅记录流的三个关键功能,类似于消息队列或企业消息传递系统。 Apache Kafka提供了一个分布式的发布-订阅消息系统和健壮的队列,可以处理大量数据,并使我们能够传递消息消耗。
Apache Kafka的优势 (Apache Kafka Advantages)
Apache Kafka provides a lot of benefits to the owners but some of them are very important where they are listed below.
Apache Kafka为所有者提供了很多好处,但是其中一些在下面列出的地方非常重要。
Reliability
is provided by Kafka because it is distributed, partitioned, replicated, and fault tolerance.Kafka提供了
Reliability
,因为它是分布式,分区,复制和容错的。Scalability
is provided by Kafka where it provides no downtime.Kafka提供了
Scalability
,它没有停机时间。Durability
is provided by Kafka where messages persist on disks as fast as possible.Kafka提供了
Durability
性,其中消息尽可能快地保留在磁盘上。Performance
is provided by Kafka where a high volume of messages for publishing and subscribing. It can provide stable performance with event TB’s of messages stored.卡夫卡(Kafka)可以提供出色的
Performance
,其中包含大量用于发布和订阅的消息。 通过存储事件TB的消息,它可以提供稳定的性能。
下载并安装适用于Linux和Windows的Kafka (Download and Install Kafka For Linux and Windows)
We can install Apache Kafka into the Linux, Ubuntu, Mint, Debian, Fedora, CentOS and Windows operating systems where Kafka is Java-based software. If we can install Java into these operating system we can run Kafka easily.
我们可以将Apache Kafka安装到Linux,Ubuntu,Mint,Debian,Fedora,CentOS和Windows操作系统中,其中Kafka是基于Java的软件。 如果我们可以在这些操作系统中安装Java,则可以轻松运行Kafka。
下载Apache Kafka (Download Apache Kafka)
We will download Apache Kafka from the following link. This link provides us nearest mirror to download.
我们将从以下链接下载Apache Kafka。 此链接为我们提供了最近的镜像下载。
https://www.apache.org/dyn/closer.cgi?path=/kafka/2.2.0/kafka_2.12-2.2.0.tgz
https://www.apache.org/dyn/closer.cgi?path=/kafka/2.2.0/kafka_2.12-2.2.0.tgz
In this case, we will download on Ubuntu with the wget
command .
在这种情况下,我们将使用wget
命令在Ubuntu上下载。
$ wget http://kozyatagi.mirror.guzel.net.tr/apache/kafka/2.2.0/kafka_2.12-2.2.0.tgz
提取下载的文件 (Extract Downloaded File)
We will extract the downloaded file with the tar
command like below.
我们将使用tar
命令提取下载的文件,如下所示。
$ tar xvf kafka_2.12-2.2.0.tgz
And we will enter to the extracted directory
然后我们将进入提取的目录
$ cd kafka_2.12-2.2.0/
启动ZooKeeper服务器 (Start ZooKeeper Server)
Apache Kafka is managed with the ZooKeeper. So in order to start Kafka, we will start the ZooKeeper Server with the provided configuration. We will use zookeeper-server-start.sh
bash script by providing provided default configuration zookeeper.properties
.
Apache Kafka由ZooKeeper管理。 因此,为了启动Kafka,我们将使用提供的配置启动ZooKeeper服务器。 通过提供提供的默认配置zookeeper.properties
我们将使用zookeeper-server-start.sh
bash脚本。
$ ./bin/zookeeper-server-start.sh config/zookeeper.properties
启动Kafka服务器(Start Kafka Server)
Now we can start the Kafka Server by using kafka-server-start.sh
with the configuration file named server.properties
.
现在,我们可以通过将kafka-server-start.sh
与配置文件server.properties
一起使用来启动Kafka Server。
$ ./bin/kafka-server-start.sh ./config/server.properties
We will see the configuration like below in the console messages.
我们将在控制台消息中看到如下所示的配置。
建立主题(Create A Topic)
We can create a topic with the kafka-topics.sh
to create a topic named poftut
.
我们可以使用kafka-topics.sh
创建一个主题,以创建一个名为poftut
的主题。
$ bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic poftut
Then we can list existing topics with the --list
parameter like below.
然后,我们可以使用--list
参数列出现有主题,如下所示。
$ bin/kafka-topics.sh --list --bootstrap-server localhost:9092
发送一些消息(Send Some Message)
We can send some message to the created topic with the kafka-console-consumer.sh
like below.
我们可以使用kafka-console-consumer.sh
向创建的主题发送一些消息,如下所示。
$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic poftut
启动消费者(Start A Consumer)
We can consume the messages in the provided topic with the kafka-console-consumer.sh
like below.
我们可以使用如下所示的kafka-console-consumer.sh
来使用提供的主题中的消息。
$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic poftut --from-beginning
Apache Kafka用例(Apache Kafka Use Cases)
Apache Kafka can be used in different cases. In this part, we will list the most convenient and popular of them.
Apache Kafka可以在不同情况下使用。 在这一部分中,我们将列出其中最方便和最受欢迎的。
讯息传递 (Messaging)
Kafka can be used for message broker. Kafka is designed as a robust, stable, high-performance message delivery. In comparison to most messaging systems, Kafka has better throughput, built-in partitioning, replication, and fault tolerance which makes is a good solution from small scale to large scale message processing applications.
Kafka可用于消息代理。 Kafka被设计为强大,稳定,高性能的消息传递。 与大多数邮件系统相比,Kafka具有更好的吞吐量,内置的分区,复制和容错能力,这是从小型到大型邮件处理应用程序的理想解决方案。
网站活动跟踪 (Website Activity Tracking)
The original use case for Kafka was able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. Activity tracking is often high volume as many activity messages are generated for each user page view.
Kafka的原始用例能够将用户活动跟踪管道重建为一组实时的发布-订阅供稿。 活动跟踪通常是高容量的,因为每个用户页面视图都会生成许多活动消息。
指标 (Metrics)
Kafka can be used as operational data monitoring. This involves tracking and distributing metric to the different producers and consumers like web, mobile, desktop.
Kafka可用作操作数据监视。 这涉及到跟踪指标并将指标分发给不同的生产者和消费者,例如网络,移动设备和台式机。
日志汇总 (Log Aggregation)
Log aggregation is a hard job to accomplish. Kafka can be used for log aggregation from different producers and senders to centrally collect them and provide other log components like SIEM, Log Archiver, etc.
日志聚合是一项艰巨的任务。 Kafka可用于来自不同生产者和发送者的日志聚合,以集中收集它们并提供其他日志组件,如SIEM,Log Archiver等。
流处理 (Stream Processing)
Apache Kafka can process stream data easily. Stream processing can be done in multiple stages where input raw data can be aggregated, enriched, or transformed into new topics for further consumption.
Apache Kafka可以轻松处理流数据。 流处理可以在多个阶段中完成,在这些阶段中,可以对输入的原始数据进行汇总,充实或转换为新主题以供进一步使用。
提交日志 (Commit Log)
Kafka can be used as external commit-log for the distributed system. The log can be used to replicate data between nodes and act as a re-sync mechanism.
Kafka可用作分布式系统的外部提交日志。 该日志可用于在节点之间复制数据并充当重新同步机制。
Apache Kafka体系结构 (Apache Kafka Architecture)
Apache Kafka has a very simple architecture from the user’s point of view. There are different actors that are used to create, connect, process, and consume data with the Kafka system.
从用户的角度来看,Apache Kafka具有非常简单的架构。 Kafka系统使用不同的参与者来创建,连接,处理和使用数据。
生产者(Producers)
Producers will create data and provide this data into the Kafka system via different ways like APIs. Producers also use different programming language SDK and library to push data created by them.
生产者将创建数据并将其通过API等不同方式提供给Kafka系统。 生产者还使用不同的编程语言SDK和库来推送他们创建的数据。
连接器 (Connectors)
Connectors are used to create scalable and reliable data streaming between Apache Kafka and other systems. Thes systems can be databases, file systems, etc.
连接器用于在Apache Kafka和其他系统之间创建可扩展且可靠的数据流。 这些系统可以是数据库,文件系统等。
流处理器(Stream Processors)
In some cases, input data should be processed or transformed. Stream Processors are used to processing and transform different topics and input data and output to different topics or consumers.
在某些情况下,应处理或转换输入数据。 流处理器用于处理和转换不同的主题以及输入数据和输出到不同的主题或使用者。
消费者 (Consumers)
Consumers are the entities that are mainly getting, using, consuming Kafka provided data. Consumers can use data from different topics that can be processed or not.
消费者是主要获取,使用和消费Kafka提供的数据的实体。 消费者可以使用来自不同主题的数据,这些数据可以进行处理。