apache kafka_Apache Kafka简介

最新推荐文章于 2024-07-23 21:52:11 发布

weixin_26755331

最新推荐文章于 2024-07-23 21:52:11 发布

阅读量108

点赞数

文章标签： python

原文链接：https://medium.com/swlh/an-introduction-to-apache-kafka-33a8dcf4def8

版权

apache kafka

Apache Kafka is a distributed event streaming platform. It provides a unified, high-throughput, highly scalable, fault-tolerant, low-latency platform for handling real-time data feeds. Kafka combines three key capabilities for end-to-end event streaming with a single battle-tested solution:

一个帕什卡夫卡是一个分布式的活动流媒体平台。它提供了一个统一的，高吞吐量，高度可扩展，容错，低延迟的平台，用于处理实时数据馈送。 Kafka将三个关键功能与端到端事件流结合在一起，并通过了一个经过战斗验证的解决方案：

To publish (write) and subscribe to (read) streams of events, including continuous import/export of your data from other systems.
发布 (写入)和订阅 (读取)事件流，包括从其他系统连续导入/导出数据。
To store streams of events durably and reliably for as long as you want.
根据需要持久而可靠地存储事件流。
To process streams of events as they occur or retrospectively.
处理事件流的发生或追溯。

Fun Fact: Kafka can be deployed on bare-metal hardware, virtual machines and containers, and on-premises as well as in the cloud. You can choose between self-managing your Kafka environments and using fully managed services offered by a variety of vendors.

有趣的事实 ：Kafka可以部署在裸机硬件，虚拟机和容器，本地以及云中。您可以在自我管理Kafka环境与使用各种供应商提供的完全托管服务之间进行选择。

This article is divided into 2 sections:-

本文分为两个部分：

Introduction
介绍
Working
加工

1.简介 (1. Introduction)

Kafka comprises of servers and clients that communicate via a high-performance TCP network protocol.

Kafka由通过高性能TCP网络协议进行通信的服务器和客户端组成 。

伺服器 (Servers)

Kafka is run as a cluster of one or more servers that can span multiple data centers or cloud regions. Some of those servers form the storage layer, called the brokers. Other servers run Kafka Connect to continuously import and export data as event streams. Kafka maintains replicas of brokers which ensures continuous operations without any data loss.

Kafka作为一个或多个服务器的集群运行，可以跨越多个数据中心或云区域。这些服务器中的一些服务器构成存储层，称为代理。其他服务器运行Kafka Connect来连续导入和导出数据作为事件流。 Kafka维护代理的副本，以确保连续操作而不会丢失任何数据。

Kafka Connect is a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes and file systems. Its main motive is to stream data to and from Kafka.

Kafka Connect是用于将Kafka与外部系统(例如数据库，键值存储，搜索索引和文件系统) 连接的框架。它的主要动机是与卡夫卡之间传输数据。

客户群 (Clients)

Clients subscribe to the Kafka stream. You can configure your Kafka client to get parallel and batch reading. Clients send an acknowledgment to servers which in hand increase the offset.

客户订阅Kafka流。您可以将Kafka客户端配置为并行读取和批量读取。客户端向服务器发送确认，这将在一定程度上增加偏移量。

主题 (Topics)

Events are pushed to topics. Topics in Kafka are multi-producer and multi-subscriber in nature. Events in a topic can be read as often as needed as events are not deleted after consumption. Instead, you define for how long Kafka should retain your events through a per-topic configuration setting, after which old events will be discarded.

将事件推送到主题。 Kafka中的主题本质上是多生产者和多用户的。可以随时根据需要读取主题中的事件，因为使用后事件不会被删除。相反，您可以通过按主题的配置设置来定义Kafka将事件保留多长时间，之后旧的事件将被丢弃。

Topics can be partitioned over multiple brokers to provide a distributed platform. This means client applications can both read and write data from/to many brokers at the same time. When an event is published to a topic, it is appended to one of the topic’s partitions. Note: Event with the same event key is written to the same partition. Kafka guarantees that the events will be consumed in the same order as it was pushed to the topic.

可以将主题划分为多个代理，以提供一个分布式平台。这意味着客户端应用程序可以同时从多个代理读取数据或向多个代理写入数据。将事件发布到主题时，该事件将附加到主题的分区之一。 注意：具有相同事件键的事件将写入同一分区。 Kafka保证事件将按照推送到主题的顺序进行消费。

Representation of publishing Kafka event to a single topic having multiple partitions present in different brokers.

As multiple client servers will be reading the data simultaneously, The Kafka brokers should be highly-available and fault-tolerant. So Kafka provides topic replication so that there are always multiple brokers that have a copy of the data which is acting as a backup as well as multiple consumers can read from this partition.

由于多个客户端服务器将同时读取数据，因此Kafka代理应具有高可用性和容错能力。因此，Kafka提供主题复制，以便始终有多个代理具有作为备份的数据副本，并且多个使用者可以从该分区读取。

Enough with the Introduction, Let’s move to the working of Kafka.

有了足够的介绍，让我们开始Kafka的工作。

2.工作 (2. Working)

生产者 (Producers)

A producer is a service that handover the data to Kafka to publish events to the partition. The client controls which partition it publishes messages to. This can be done at random, implementing a kind of random load balancing, or it can be done by some semantic partitioning to partition by and using this to hash to a partition. For example, if the key chosen was a user id then all data for a given user would be sent to the same partition.

生产者是一种将数据移交给Kafka以便将事件发布到分区的服务。客户端控制将消息发布到哪个分区。这可以随机执行，实现一种随机负载平衡，也可以通过一些语义分区来划分，然后使用此方法哈希到分区来完成。例如，如果选择的键是用户ID，则给定用户的所有数据将发送到同一分区。

Fun Fact: Event Batching provides efficiency by attempting to accumulate data in memory to send out larger batches in a single request.

有趣的事实：事件批处理通过尝试在内存中累积数据以在单个请求中发送更大的批处理来提高效率。

To publish an event through terminal you will need to run the Kafka server and well as Zookeeper server.

要通过终端发布事件，您将需要运行Kafka服务器以及Zookeeper服务器。

Download the Kafka files from here. Unzip the folder and cd to the Kafka folder.

从此处下载Kafka文件。解压缩该文件夹并将cd压缩到Kafka文件夹。

$ bin/zookeeper-server-start.sh config/zookeeper.properties

This will up your zookeeper server. In a nutshell, the Zookeeper elects the partition leader, and when the leader fails it elects the new leader from the available partitions.

这将启动您的zookeeper服务器。简而言之，Zookeeper会选择分区领导者，当领导者失败时，它将从可用分区中选择新的领导者。

$ bin/kafka-server-start.sh config/server.properties

This will up the Kafka server. By default, the Kafka server runs on 9092 port but you can configure it in the config file.

这将启动Kafka服务器。默认情况下，Kafka服务器在9092端口上运行，但是您可以在配置文件中对其进行配置。

To produce a Kafka on terminal

在终端上生产Kafka

$ bin/kafka-console-producer.sh --topic mediumTopic --bootstrap-server localhost:9092

In the command, we define the server endpoint and port and in our case the Kafka topic in which we are publishing the event is mediumTopic.

在命令中，我们定义服务器端点和端口，在本例中，发布事件的Kafka主题是mediumTopic 。

Kafka服务器/代理 (Kafka Server/Broker)

In layman language, Kafka Server works as a file system where events are stored under partitions. Multiple Kafka servers comprise to make a Kafka cluster.

Kafka Server以通俗易懂的语言用作文件系统，其中事件存储在分区下。多个Kafka服务器组成一个Kafka集群。

动物园管理员 (Zookeeper)

Zookeeper provides an abstraction layer that ensures the availability of Kafka brokers. It checks which brokers are alive and are part of the cluster. It elects the primary controller. A controller is one of the brokers and is responsible for maintaining the leader/follower relationship for all the partitions.

Zookeeper提供了一个抽象层，可确保Kafka代理的可用性。它检查哪些经纪人还活着并且是集群的一部分。它选举主控制器。控制器是代理之一，负责维护所有分区的领导者/从属者关系。

消费者/消费群体 (Consumers/Consumer Groups)

The consumer is a single server that subscribes to the cluster to consumer messages but these days we use micro-service architecture and our service is deployed on multiple instances. To ensure our message is read single time by the service we use Consumer Groups. So, If there are 4 partitions and 3 consumer instances then Kafka will balance 1,2 partitions with each instance. So that no replica read occurs. To read a Kafka message from terminal run the below command:-

使用者是一台服务器，它向群集订阅使用者消息，但是如今，我们使用微服务架构，并且我们的服务部署在多个实例上。为了确保我们的消息能够被服务一次性读取，我们使用了消费者组。因此，如果有4个分区和3个使用者实例，那么Kafka将为每个实例平衡1,2个分区。这样就不会读取任何副本。要从终端读取Kafka消息，请运行以下命令：-

$ bin/kafka-console-consumer.sh --topic mediumTopic --from-beginning --bootstrap-server localhost:9092

You can also configure the message persistence time. So, Every time a new consumer joins, The messages are read from the beginning or you can provide a manual offset as a parameter in the command.

您还可以配置消息持续时间。因此，每次新的使用者加入时，都将从头开始读取消息，或者您可以在命令中提供手动偏移量作为参数。

If this article was somewhat helpful to you then please don’t forget to give it a cheer. Support is something that keeps one motivated. Peace ✌🏻

如果本文对您有所帮助，请别忘了为它加油。支持是一种动力。和平✌🏻