apache kafka_Apache Kafka简介

最新推荐文章于 2020-09-05 19:41:52 发布

culiu9261

最新推荐文章于 2020-09-05 19:41:52 发布

阅读量236

点赞数

文章标签：大数据 kafka java 数据库 python

原文链接：https://scotch.io/tutorials/an-introduction-to-apache-kafka

版权

apache kafka

卡夫卡的历史 ( Kafka's history )

Before we dive in deep into how Kafka works and get our hands messy, here's a little backstory.

在我们深入探讨Kafka的工作原理并弄乱我们的手之前，这里有一些背景故事。

Kafka is named after the acclaimed German writer, Franz Kafka and was created by LinkedIn as a result of the growing need to implement a fault tolerant, redundant way to handle their connected systems and ever growing pool of data.

Kafka以著名的德国作家Franz Kafka的名字命名，由LinkedIn创建，是由于人们越来越需要实施容错，冗余的方式来处理其连接的系统以及不断增长的数据池。

什么是卡夫卡？ ( What is Kafka? )

Briefly put, Apache Kafka is a distributed streaming application.

简而言之，Apache Kafka是一个分布式流应用程序。

Let's take a deeper look at what this means:

让我们更深入地了解这意味着什么：

Distribution: Generally, in software architecture, distribution is a measure of how components in a system are able to perform autonomously. Here are a few defining characteristics of distribution.

分布：通常，在软件体系结构中，分布是对系统组件如何能够自动执行的一种度量。以下是一些定义的分布特征。
Several computational entities, workers, nodes or components working to achieve a singular goal.

为实现单一目标而工作的几个计算实体，工作人员，节点或组件。
Segregation of work between components in the system.

系统中组件之间的工作隔离。
Concurrency of components: Each component performs operations in its own order, independent of the order of operations of other components in the system.

组件的并发性：每个组件都以其自己的顺序执行操作，而与系统中其他组件的操作顺序无关。
Lack of a global clock: The functions of the components in the system are not synchronised.

缺少全局时钟：系统中组件的功能不同步。
Independent failure: If a component in the system fails, then its failure should not affect other components.

独立故障：如果系统中的某个组件发生故障，则其故障不应影响其他组件。
Communication between the nodes of the system is achieved through message passing.

系统节点之间的通信是通过消息传递实现的。
Streaming: This refers to the real time and continuous nature of the storage, processing and retrieval of Kafka messages.

流式传输 ：这是指Kafka消息的存储，处理和检索的实时性和连续性。

In this guide, we'll see how Kafka achieves this.

在本指南中，我们将了解Kafka如何实现这一目标。

微服务架构 ( Microservices Architecture )

Before we discuss how Kafka works, I think a good place to start would be to explain the context in which Kafka functions.

在讨论Kafka的工作方式之前，我认为一个不错的起点是解释Kafka发挥作用的环境。

With increasing frequency, the microservices software architecture is becoming an indispensible paradigm for software engineering and development. As the scale of applications increase, that is the data they consume, process and output increases, it becomes increasingly important to find fault tolerant, scalable ways to manage both systems and the data they manage.

随着频率的增加，微服务软件体系结构已成为软件工程和开发不可或缺的范例。随着应用程序规模的增加，即它们消耗，处理和输出的数据的增加，找到容错，可扩展的方式来管理系统及其管理的数据变得越来越重要。

Like the name suggests, a microservice is a piece of software that serves a singular purpose and works with other system components to perform a task.

顾名思义， 微服务是一种服务于单一目的并与其他系统组件一起执行任务的软件。

It is here that Kafka shines. It lends itself well to enabling communication between microservices in a system through the passing of messages between them.

卡夫卡在这里发光。它非常适合通过微服务之间的消息传递来实现系统中微服务之间的通信。

卡夫卡组件 ( Kafka components )

Kafka achieves messaging through a publish and subscribe system facilitated by the following components:

Kafka通过由以下组件推动的发布和订阅系统实现消息传递：

话题 (Topic)

Topics are how Kafka stores and organises messages across its system and are essentially a collection of messages. Visualise them as logs wherein Kafka stores messages. Topics can be replicated (copies are created) and partitioned (divided). The ability to replicate and partition topics is one of the factors that enable Kafka's fault tolerance and scalability.

主题是Kafka如何在其系统中存储和组织消息的方法，本质上是消息的集合。将它们可视化为Kafka存储消息的日志。可以复制主题（创建副本）和分区（划分）。复制和分区主题的能力是启用Kafka的容错性和可伸缩性的因素之一。

制片人 (Producer)

The producer publishes messages to a Kafka topic.

生产者将消息发布到Kafka主题。

消费者 (Consumer)

The consumer subscribes to a topic(s), reads and processes messages from topic(s).

消费者订阅一个或多个主题，读取和处理来自一个或多个主题的消息。

经纪人 (Broker)

The broker/server(s) manage the storage of messages in topic(s). Many brokers form a Kafka cluster.

代理/服务器管理主题中消息的存储。许多经纪人组成了Kafka集群。

动物园管理员 (Zookeeper)

Kafka uses zookeeper to provide the brokers with metadata about the processes running in the system and to facilitate health checking and broker leadership election.

Kafka使用Zookeeper为经纪人提供有关系统中正在运行的进程的元数据，并促进健康检查和经纪人领导权选举。

先决条件 ( Prerequisites )

You'll need Java for this, so go ahead and download the SDK from here .

为此，您将需要Java，因此请继续从此处下载SDK。

Download the Kafka source files from here and unzip them to a directory of your choice. As of the writing of this article, the current release version is _0.10.1.0__ _

从此处下载Kafka源文件，并将其解压缩到您选择的目录中。 在撰写本文时，当前发行版本是_ 0.10.1.0 __ _

Doing this using the tar utility is trivial. tar -xzf kafka_2.11-0.10.1.0.tgz

使用tar实用程序这样做很简单。 tar -xzf kafka_2.11-0.10.1.0.tgz

You can also download Kafka using Homebrew on macOS, but here, we'll use the method above in order to expose the directory structure of Kafka and give us easy access to its files.

您也可以在macOS上使用Homebrew下载Kafka，但是在这里，我们将使用上述方法来公开Kafka的目录结构，并让我们轻松访问其文件。

示范 ( Demonstration )

启动动物园管理员 (Starting zookeeper)

Let's begin by starting up zookeeper by running the following command at the root of the uncompressed folder.

首先，通过在未压缩文件夹的根目录下运行以下命令来启动zookeeper。

bin/zookeeper-server-start.sh config/zookeeper.properties

The indication that zookeeper has come alive is a stream of information output to your terminal window.

动物园管理员活着的指示是向您的终端窗口输出的信息流。

In another shell, let's start a Kafka broker like this:

在另一个外壳中，让我们启动一个像这样的Kafka经纪人：

bin/kafka-server-start.sh config/server.properties

Of note is the fact that we can create multiple Kafka brokers simply by copying the server.properties file and making a few modifications to the values in the following fields, which must be unique to each broker:

值得注意的是，我们可以简单地通过复制server.properties文件并对以下字段中的值进行一些修改来创建多个Kafka经纪人，这些字段对于每个经纪人必须是唯一的：

broker.id
broker.id
listeners: The first broker was started at localhost:9092.
listeners ：第一个代理从localhost:9092启动。
log.dirs: The physical location where each broker will store its messages.
log.dirs ：每个代理将存储其消息的物理位置。

创建一个Kafka主题 (Creating a Kafka topic)

In another shell, create a Kafka topic called my-kafka-topic like this.

在另一个外壳中，像这样创建一个名为my-kafka-topic的Kafka主题。

bin/kafka-topics.sh --create --topic my-kafka-topic --zookeeper localhost:2181 --partitions 1 --replication-factor 1

You should receive confirmation that the topic has been created. A few points of note:

您应该收到确认已创建主题。注意事项：

We invoke zookeeper because it manages metadata relevant to the Kafka brokers/cluster and therefore would need to know when topics are created.
我们调用zookeeper是因为它管理与Kafka经纪人/集群相关的元数据，因此需要知道何时创建主题。
The partition and replication factor can be changed depending on how many paritions and topic replicas you require. The default number of partitions and replicas is set to 1, but this is configurable in the bin/kafka-topics.sh file.
可以根据所需的分区和主题副本数量来更改分区和复制因子。分区和副本的默认数量设置为1，但这可以在bin/kafka-topics.sh文件中进行配置。
The number of replicas must be equal to or less than the number of brokers currently operational.
复制副本的数量必须等于或小于当前正在运行的代理数量。

If you now look at the zookeeper stream we began earlier, you'll notice that the broker has registered our newly created topic.

现在，如果您看一下我们之前开始的zookeeper流，您会注意到代理已注册了我们新创建的主题。

创建一个Kafka生产者 (Creating a Kafka Producer)

In another shell, let's create a Kafka producer with this:

在另一个外壳中，让我们用以下代码创建一个Kafka生产者：

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-kafka-topic

A few points of note:

注意事项：

We invoke the broker we started, listening at localhost:9092 because it manages the storage of messages to topics. More information about this port can be found in the config/server.properties file.
我们调用启动的代理，侦听localhost:9092因为它管理主题消息的存储。可以在config/server.properties文件中找到有关此端口的更多信息。
We will produce messages to the just created topic my-kafka-topic
我们将向刚刚创建的主题my-kafka-topic生成消息

创建一个卡夫卡消费者 (Creating a Kafka Consumer)

In yet another shell, run this to start a Kafka consumer:

在另一个shell中，运行以下命令以启动Kafka使用者：

bin/kafka-consumer.sh --bootstrap-server localhost:9092 --topic my-kafka-topic --from-beginning

Now the real fun begins.

现在，真正的乐趣开始了。

In the producer stream, type some messages. Press enter after each one and watch out for what happens in the shell you started the consumer in.

在生产者流中，键入一些消息。在每个输入框之后按Enter键，并注意启动使用者的外壳中发生了什么。

Drumroll please!

请打鼓！

The messages from the producer are appearing in the consumer thread. I must admit, the first time I saw this at work, I was quite impressed.

来自生产者的消息出现在消费者线程中。我必须承认，当我第一次在工作中看到这一点时，我印象深刻。

After you catch your breath, let's do a little snooping to find out how Kafka achieves this.

屏住呼吸后，让我们进行一些探查，以了解Kafka如何实现这一目标。

At the root of the downloaded Kafka folder, run:

在下载的Kafka文件夹的根目录下，运行：

cd tmp/kafka-logs/my-kafka-topic-0
cat 00000000000000000000.log

Here, we find our produced and consumed messages. This is why Kafka is often called a distributed commit log. It functions as an immutable record of messages.

在这里，我们找到产生和消耗的消息。这就是为什么Kafka通常被称为分布式提交日志的原因。它充当消息的不变记录。

Of note is the fact that you can dictate in which physical file the broker saves messages. This setting can be found in the config/server.properties file.

值得注意的事实是，您可以指定代理将消息保存在哪个物理文件中。可以在config/server.properties文件中找到此设置。

我们的第一个Apache Kafka应用程序 ( Our first Apache Kafka application )

In this section, we'll create an Apache Kafka producer in Python and a Kafka consumer in JavaScript.

在本节中，我们将使用Python创建一个Apache Kafka生产者，并使用JavaScript创建一个Kafka使用者。

先决条件 (Prerequisites)

We'll need a few things

我们需要一些东西

Confirm you've installed both correctly using

确认您已使用正确安装了两者

node --version&& virtualenv --version

You should see similar results

您应该会看到类似的结果

v6.2.2
15.0.3

目录结构 (Directory Structure)

Next, create the following folder structure.

接下来，创建以下文件夹结构。

├── scotch-kafka
    ├── producer
         ├── producer.py
    ├── consumer
         ├── consumer.js

Make sure you've started zookeeper and a broker as above.

确保您已经按照上述步骤启动了zookeeper和broker 。

Python生产者 (Python producer)

Make a virtual environment and while inside it, install the Python kafka module by running:

创建一个虚拟环境，并在其中运行以下命令，以安装Python kafka模块：

pipinstall kafka-python

Write the following into the producer.js file

将以下内容写入producer.js文件

from kafka import KafkaProducer
import json

# Create an instance of the Kafka producer
producer = KafkaProducer(bootstrap_servers='localhost:9092',
                            value_serializer=lambda v: json.dumps(v).encode('utf-8'))

# Call the producer.send method with a producer-record
for message in range(5):
    producer.send('kafka-python-topic', {'values': message})

Let's break down the producer instantiation.

让我们分解生产者实例。

A few items are required for Kafka to create an instance of a producer. These are usually called producer configuration properties.

Kafka创建生产者实例需要一些项目。这些通常称为生产者配置属性。

A list of brokers to connect to: Herein given under bootstrap_servers
要连接的代理列表：在bootstrap_servers下给出
Key and Value serialisers: These properties indicate to the producer how to serialise the messages it will publish and is set to a string serialiser by default. In the producer above, we're specifically choosing a JSON serialisation format.
键和值序列化程序：这些属性向生产者指示如何序列化它将发布的消息，并且默认情况下设置为字符串序列化程序。在上面的生产者中，我们专门选择JSON序列化格式。

Of note is that, Kafka producer instances can only send Producer Record values that match the key and value serialisers types the producer is configured with.

值得注意的是，Kafka生产者实例只能发送与生产者配置了键和值序列化器类型匹配的生产者记录值。

To send the messages, referred to in Kafka terminology, as producer records, we need to call the producer.send function and supply both a topic and value at minimum.

要发送以Kafka术语称为生产者记录的消息，我们需要调用producer.send函数，并至少提供topic和value 。

Some optional values we could provide are:

我们可以提供的一些可选值是：

Partition: Refers to the partition within the topic the message is to be published to.
分区：是指邮件要发布到的主题中的分区。
Timestamp: The time at which the producer published the message. This setting is configurable using the log.message.timestamp.type setting in the server.properties file.
时间戳：生产者发布消息的时间。可以使用server.properties文件中的log.message.timestamp.type设置来配置此设置。
Key: In conjunction to the partition value, this is used to determine how and to which partition in a topic the Kafka producer will be sending a message to.
密钥：与分区值一起，用于确定Kafka生产者将消息发送到的主题的方式和分区。

Note that the topic we're using has the name kafka-python-topic, so you'll have to create a topic of the same name.

请注意，我们正在使用的主题名称为kafka-python-topic ，因此您必须创建相同名称的主题。

bin/kafka-topics.sh --create --topic kafka-python-topic --zookeeper localhost:2181 --partitions 1 --replication-factor 1

JavaScript使用者 (JavaScript Consumer)

In the scotch-kafka folder, run:

在scotch-kafka文件夹中，运行：

npm install no-kafka

Write the following into the consumer.js file.

将以下内容写入consumer.js文件。

const Kafka = require('no-kafka');

// Create an instance of the Kafka consumer
const consumer = new Kafka.SimpleConsumer
var data = function (messageSet, topic, partition) {
    messageSet.forEach(function (m) {
        console.log(topic, partition, m.offset, m.message.value.toString('utf8'));
    });
};

// Subscribe to the Kafka topic
return consumer.init().then(function () {
    return consumer.subscribe('kafka-python-topic', data);
});

In order to create an instance of a consumer, Kafka requires the following:

为了创建使用者的实例，Kafka需要以下条件：

A list of brokers to connect to. Here, the no-kafka module sets a default address of localhost:9092
要连接的经纪人列表。在这里， no-kafka模块将默认地址设置为localhost:9092
Key and Value deserialisers: Here, the message is converted from json to string.
键和值反序列器：在这里，消息从json转换为字符串。

At the root of the project, run the following to start the consumer

在项目的根目录下，运行以下命令以启动使用者

node consumer/consumer.js

and in another shell,

在另一个外壳中

python producer/producer.py

The consumer output should look something like this:

消费者输出应如下所示：

kafka-python-topic 0 216{"values": 0}
kafka-python-topic 0 217 {"values": 1}
kafka-python-topic 0 218 {"values": 2}
kafka-python-topic 0 219 {"values": 3}
kafka-python-topic 0 220 {"values": 4}

结论 ( Conclusion )

Congratulations, you've just built your first application using Apache Kafka. Not a mean feat at all.

恭喜，您已经使用Apache Kafka构建了第一个应用程序。完全不是刻薄的壮举。

You'll find all the code we've used here.

您会在这里找到我们使用的所有代码。

I hope this tutorial will act as a foundation from which to learn more about Kafka, it's use cases and the many advantages it has over traditional messaging systems.

我希望本教程将成为进一步了解Kafka，使用案例以及与传统消息传递系统相比具有许多优点的基础。

Have a minute? Give me some feedback. Drop me a line in the comment box below.

有空吗给我一些反馈。在下面的评论框中给我一行。

翻译自: https://scotch.io/tutorials/an-introduction-to-apache-kafka

apache kafka

culiu9261

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
apache kafka_Apache Kafka简介

apache kafka 卡夫卡的历史 (Kafka's history)Before we dive in deep into how Kafka works and get our hands messy, here's a little backstory. 在我们深入探讨Kafka的工作原理并弄乱我们的手之前，这里有一些背景故事。 Kafka is named after the...
复制链接

扫一扫