kafka(一)

最新推荐文章于 2024-03-30 22:19:52 发布

朱红旭

最新推荐文章于 2024-03-30 22:19:52 发布

阅读量132

点赞数

分类专栏： kafka

kafka 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1. what‘s kafka?

Apache Kafka® is a distributed streaming platform, which has four core APIs:

Producer API
Consumer API
Streams API
Connector API

2. what’s topic?

A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.
(topic是一个存放生产者生产消息的地方的统称，一个topic可以被零个或者多个消费者订阅。)
For each topic, the Kafka cluster maintains a partitioned log that looks like this:
(对于每一个topic，kafka集群中包含一个像下图一样的分区日志：)

3. what’s partition?

Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.
(记录按照顺序一个一个地追加到partition中，每一个记录在这个partition中都会有一个唯一的序列号，这个序列号称之为offset。)
The Kafka cluster durably persists all published records—whether or not they have been consumed—using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. Kafka’s performance is effectively constant with respect to data size so storing data for a long time is not a problem.
(不管记录有没有被消费，kafka集群都会持久化它们，当然，你也可以配置一个保留策略。)

In fact, the only metadata retained on a per-consumer basis is the offset or position of that consumer in the log. This offset is controlled by the consumer: normally a consumer will advance its offset linearly as it reads records, but, in fact, since the position is controlled by the consumer it can consume records in any order it likes. For example a consumer can reset to an older offset to reprocess data from the past or skip ahead to the most recent record and start consuming from “now”.
(事实上，消费者的元数据保存的是它在日志文件中的偏移量，正常情况下，当某个消费者消费一条记录的时候，它的偏移量会自动增加，当然，这个偏移量也完全受消费者控制，因此，消费者也可以重置偏移量从而达到重复消费消息的目的。)
The partitions in the log serve several purposes. First, they allow the log to scale beyond a size that will fit on a single server. Each individual partition must fit on the servers that host it, but a topic may have many partitions so it can handle an arbitrary amount of data. Second they act as the unit of parallelism—more on that in a bit.
(分区有很多好处，其一：允许日志文件的大小超出一个节点所能存储的大小，当然，每个分区的数据大小必须要在该节点所能存储的大小之内。因为一个主题可以有多个分区，所以一个主题可以处理很大的数据量，其二：每一个分区都是平行的，更多的以后再讨论。)

朱红旭

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
kafka(一)

1. what‘s kafka?Apache Kafka® is a distributed streaming platform, which has four core APIs:Producer APIConsumer APIStreams APIConnector API2. what’s topic?A topic is a category or feed name ...
复制链接

扫一扫

专栏目录