apache kafka_Apache Kafka流媒体平台

最新推荐文章于 2021-07-04 19:09:16 发布

weixin_26705651

最新推荐文章于 2021-07-04 19:09:16 发布

阅读量121

点赞数

文章标签： python

原文链接：https://medium.com/@binary10111010/apache-kafka-a-streaming-platform-57e38f8f9bc1

版权

apache kafka

Life is a stream of events

生活是一连串的事件

In a World that produces and depends on a data , there was a need for a platform to handle a continuous flow of data , Kafka is a streaming platform that lets you publish and subscribe to stream of data, store them and process them

在产生并依赖数据的世界中，需要一个平台来处理连续的数据流，Kafka是一种流媒体平台，可让您发布和订阅数据流，对其进行存储和处理

Kafka has a number of core differences from traditional messaging systems that make. It runs as a cluster and can scale to handle all the applications ineven the most massive of companies.

与传统的邮件系统相比，Kafka具有许多核心差异。它作为集群运行，可以扩展以处理所有应用程序，甚至包括大型公司。

Every enterprise is powered by data , Every application creates data, Every byte of data has a story to tell, something of importance that will inform the next thing to be done.

每个企业都由数据驱动，每个应用程序创建数据，数据的每个字节都有一个故事要讲，这很重要，它将告诉下一步要做的事情。

Before we take a look at the initial parts of Kafka system , lets talk about the Publish/Subscribe Messaging concept.

在介绍Kafka系统的初始部分之前，让我们谈谈“ 发布/订阅消息传递”概念。

messaging is a pattern that is characterized by the sender (publisher) of a piece of data (message) not specifically directing it to a receiver. Instead, the publisher classifies the message somehow, and that receiver (subscriber) subscribes to receive certain classes of messages. Publish/subscribe systems often have a broker, a central point where messages are published, to facilitate this.

消息传递是一种模式，其特征在于数据(消息)的发送者(发布者)没有专门将其定向到接收者。而是，发布者以某种方式对消息进行分类，而接收者(订阅者)订阅以接收某些类的消息。发布/订阅系统通常具有代理(broker)，即消息发布的中心，以方便此操作。

The need to have a distributed messaging subscription system instead of utilizing a point-to-point connection and a duplicate work to handle all the events coming from different parts of the Organization systems , made Kafka a great approach to handle the massive data produced from each application.

需要使用分布式消息传递订阅系统而不是使用点对点连接和重复的工作来处理来自组织系统不同部分的所有事件，这使卡夫卡成为处理每个组件产生的大量数据的好方法应用。

The unit of data:

数据单位：

The unit of data within Kafka is called a message, A message is simply an array of bytes.A message can have an optional bit of metadata, which is referred to as a key. The key is also a byte array and, as with the message, Keys are used when messages are to be written to partitions in a more controlled manner. The simplest such scheme is to generate a consistent hash of the key, and then select the partition number for that message by taking the result of the hash.

Kafka中的数据单位称为消息，消息只是字节数组。消息可以具有可选的元数据位，称为密钥。 密钥也是字节数组，与消息一样，当消息将以更可控的方式写入分区时，将使用密钥。最简单的方案是生成密钥的一致哈希，然后通过获取哈希结果为该消息选择分区号 。

For efficiency, messages are written into Kafka in batches. A batch is just a collection of messages, all of which are being produced to the same topic and partition

为了提高效率，将消息分批写入Kafka。 批处理只是消息的集合，所有消息都在同一主题和分区中生成

Topics and Partitions

主题和分区

Messages in Kafka are categorized into topics,Topics are additionally broken down into a number of partitions.

Kafka中的邮件分为主题，主题另外细分为多个分区。

a partition is a single log. Messages are written to it in an append-only fashion, and are read in order from beginning to end.Partitions are also the way that Kafka provides redundancy and scalability. Each partition can be hosted on a different server, which means that a single topic can be scaled horizontally across multiple servers.

分区是单个日志。消息以仅追加方式写入其中，并从头到尾依次读取。分区也是Kafka提供冗余和可伸缩性的方式。每个分区可以托管在不同的服务器上，这意味着单个主题可以在多个服务器上水平扩展。

A Partition is a group of Segments.

分区是一组段。

A Segment: is an individual files on disk on the Broker

段：是代理上磁盘上的单个文件

A stream: a single topic of data, regardless of the number of partitions

流：单个数据主题，与分区数量无关

Parts of Kafka system:

Kafka系统的一部分：

Producer: the applications and systems that produces or send data to Kafka, it will receive “ack” or “nack” signal from the kafka system.
生产者 ：产生或发送数据到Kafka的应用程序和系统，它将从kafka系统接收“ ack”或“ nack”信号。

The producer does not care what partition a specific message is written to and will balance messages over all partitions of a topic evenly

生产者不关心将特定消息写入哪个分区，并且将平均平衡主题的所有分区上的消息

“ack”: acknowledged; the kafka system was able to receive the data.

“ ack”：已确认； kafka系统能够接收数据。

“nack”: negative acknowledge; the kafka system was unable to receive the data for whatever reason, most of the producers will try to re-send the data again.

“ nack”：否定的承认； kafka系统由于任何原因均无法接收数据，大多数生产者将尝试再次重新发送数据。

2. Consumers : read messages.

2. 消费者 ：阅读邮件。

The consumer subscribes to one or more topics and reads the messages in the order in which they were produced. The consumer keeps track of which messages it has already consumed by keeping track of the offset of messages

消费者订阅一个或多个主题，并按其产生的顺序阅读消息。使用者通过跟踪消息的偏移量来跟踪已使用的消息

Offset : an integer value that continually increases — that Kafka adds to each message as it is produced

偏移量 ：一个不断增加的整数值-卡夫卡在产生时将其添加到每个消息中

Consumers work as part of a consumer group, which is one or more consumers that work together to consume a topic. The group assures that each partition is only consumed by one member

消费者是消费者群体的一部分，该消费者群体是一个或多个共同努力以消费某个主题的消费者。该组确保每个分区仅由一个成员使用

3. Brokers and Clusters

3. 经纪人和集群

A single Kafka server is called a broker. The broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk. It also services consumers, responding to fetch requests for partitions and responding with the messages that have been committed to disk.

一台Kafka服务器称为代理。代理从生产者接收消息，为它们分配偏移量 ，然后将消息提交到磁盘上的存储中。它还为使用者提供服务，响应对分区的提取请求并响应已提交到磁盘的消息。

Kafka brokers are designed to operate as part of a cluster. Within a cluster of brokers, one broker will also function as the cluster controller (elected automatically from the live members of the cluster).

卡夫卡经纪人旨在作为集群的一部分进行运营。在代理群集中 ，一个代理还将充当群集控制器 (从群集的实时成员中自动选出)。

The controller is responsible for administrative operations, including assigning partitions to brokers and monitoring for broker failures

控制器负责管理操作，包括将分区分配给代理并监视代理故障

A partition is owned by a single broker in the cluster, and that broker is called the leader of the partition. A partition may be assigned to multiple brokers, which will result in the partition being replicated.

分区由集群中的单个代理拥有，该代理称为分区的领导者 。可以将一个分区分配给多个代理，这将导致该分区被复制。

Consumers and Producers are de-coupled , meaning slow consumers don’t impact producers , adding more or the failures of consumers with no impact of the producers.

消费者和生产者是分离的，这意味着缓慢的消费者不会影响生产者，而增加或失败的消费者不会对生产者产生影响。

为什么选择卡夫卡 (Why Kafka)

Multiple Producers

多个生产者

Kafka is able to seamlessly handle multiple producers, so Kafka can aggregating data from many frontend systems and making it consistent.

Kafka能够无缝处理多个生产者，因此Kafka可以聚合来自许多前端系统的数据并使之保持一致。

Multiple Consumers

多个消费者

Kafka is designed for multiple consumers to read any single stream of messages without interfering with each other.

Kafka旨在让多个使用者读取任何单个消息流，而不会互相干扰。

Disk-Based Retention

基于磁盘的保留

Messages are committed to disk, and will be stored with configurable retention rules. These options can be selected on a per-topic basis, allowing for different streams of messages to have different amounts of retention depending on the consumer needs.

消息将提交到磁盘，并将与可配置的保留规则一起存储。可以基于每个主题选择这些选项，从而使不同的消息流具有不同的保留量，具体取决于消费者的需求。

Scalable

可扩展

Kafka’s flexible scalability makes it easy to handle any amount of data. Expansions can be per‐formed while the cluster is online, with no impact on the availability of the system as a whole. This also means that a cluster of multiple brokers can handle the failure of an individual broker, and continue servicing clients.

Kafka灵活的可扩展性使其可以轻松处理任何数量的数据。集群联机时可以执行扩展，而不会影响整个系统的可用性。这也意味着由多个代理组成的集群可以处理单个代理的故障，并继续为客户端提供服务。