kafka(一)

1. what‘s kafka?

Apache Kafka® is a distributed streaming platform, which has four core APIs:

  • Producer API
  • Consumer API
  • Streams API
  • Connector API

2. what’s topic?

A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.
(topic是一个存放生产者生产消息的地方的统称,一个topic可以被零个或者多个消费者订阅。)
For each topic, the Kafka cluster maintains a partitioned log that looks like this:
(对于每一个topic,kafka集群中包含一个像下图一样的分区日志:)

在这里插入图片描述

3. what’s partition?

Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.
(记录按照顺序一个一个地追加到partition中,每一个记录在这个partition中都会有一个唯一的序列号,这个序列号称之为offset。)
The Kafka cluster durably persists all published records—whether or not they have been consumed—using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. Kafka’s performance is effectively constant with respect to data size so storing data for a long time is not a problem.
(不管记录有没有被消费,kafka集群都会持久化它们,当然,你也可以配置一个保留策略。)

在这里插入图片描述

In fact, the only metadata retained on a per-consumer basis is the offset or position of that consumer in the log. This offset is controlled by the consumer: normally a consumer will advance its offset linearly as it reads records, but, in fact, since the position is controlled by the consumer it can consume records in any order it likes. For example a consumer can reset to an older offset to reprocess data from the past or skip ahead to the most recent record and start consuming from “now”.
(事实上,消费者的元数据保存的是它在日志文件中的偏移量,正常情况下,当某个消费者消费一条记录的时候,它的偏移量会自动增加,当然,这个偏移量也完全受消费者控制,因此,消费者也可以重置偏移量从而达到重复消费消息的目的。)
The partitions in the log serve several purposes. First, they allow the log to scale beyond a size that will fit on a single server. Each individual partition must fit on the servers that host it, but a topic may have many partitions so it can handle an arbitrary amount of data. Second they act as the unit of parallelism—more on that in a bit.
(分区有很多好处,其一:允许日志文件的大小超出一个节点所能存储的大小,当然,每个分区的数据大小必须要在该节点所能存储的大小之内。因为一个主题可以有多个分区,所以一个主题可以处理很大的数据量,其二:每一个分区都是平行的,更多的以后再讨论。)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值