Kafka基本概念

最新推荐文章于 2022-11-17 22:39:26 发布

Danielka

最新推荐文章于 2022-11-17 22:39:26 发布

阅读量1.8k

点赞数

分类专栏：大数据文章标签： kafka hadoop

本文链接：https://blog.csdn.net/qq_42564692/article/details/124177183

版权

大数据专栏收录该内容

2 篇文章 0 订阅

订阅专栏

3个特点

发布（写）和订阅（读）事件流，包括连续地将数据从其他系统输入和输出。
可以如你想要的那样长时间的、可靠地、经久地存储事件流。
在事件发生时或回顾时处理事件流。

组成

servers

一些servers来自储存层，被称为brokers
另一些servers运行 Kafka Connect来持续地输入和输出数据作为事件流使得Kafka可以与现有的如关系型数据库、或是其他kafka集群相整合。高容错和可扩展，可以使kafka应对重要的工作，不会造成数据丢失。

Clients
你可以写一些可以并行地读、写、处理事件流的分布式应用和微服务.clients 支持 Java 和 Scala 包括更高级别的 kafka Streams library 包含 go, Python, C/C++ 以及许多其他的编程语言，还有 REST API。

概念

一个事件 event记录着业务中某事的发生，他同样被叫做记录record或者消息message。当你读取或写入数据到kafka，你会以事件的方式来进行，概念上来讲，一个事件有一个 Key，一个 value，一个 timestamp 以及可选的 metadata headers。一下是一个event例子：

Event key :“Alice”
Event value: “Made a payment of $200 to Bob”
Event timestamp: “Jun 25, 2020 at 2:06 p.m.”

Producers 指的是那些 publish（write）events 给 kafka 的客户端应用 。 Consumers 指的是那些 subscribe to (read and process) 这些 events 的东西。在Kafka中 producers 和 consumers 是彼此完全解耦以及不知道对方存在。这是能达到高扩展性的一个重要设计因素。 Producers 永远不需要等到 consumers。 Kafka只会处理事件一次。

Events被组织好以耐久性地方式存储在 topics 中。简单的说，一个topic类似于文件系统中的一个folder，events指的就是文件夹中的文件。 kafka中的 topics 总是 multi-producers 和 multi-subscriber的，一个 topic 可以有 0 个， 1 个，或者许多可以写入events 的 producers。或是有 0个、1个、或许多可以subscribe to (读）这些 events的 consumers。topic中的 events 可以被读 as often as needed—不像传统的消息系统：events 在消费之后没有被删除。取而代之的是，你可以通过一个per-topic configuration setting 定义kafka应该保留events的时间。kafka的性能与data size 无关，因此长时间保存数据完全ok。

Topics是被分区的，意味着，一个topic是被分布在一系列在不同kafka brokers 上的 “buckets”上的。这个分布式的数据存储方式对可扩展性很重要，因为它允许客户端应用能同时从许多 brokers 读和写数据。当一个新 event 被发布到一个 topic里它其实是被追加在了一个 topic 分区， events with the same event key （eg：a customer or vehicle ID) 被写到同一个分区。 Kafka 保证任何一个给定 topic分区的 consumer 总是可以读取这个分区的events in exactly the same order as they were written。

在这里插入图片描述
events with the same key 被写入到同一分区（以同样的颜色标注），两个producers都可以写入到同一分区如果合适的话。

为了使系统具备容错性和高可用，每个topic都可以被复制，甚至在整个地理区域或是数据中心。要确保总要有brokers有一份数据的copy。通常的replication factor被设置为3，也就是数据会复制成三份，副本的实现是在topic-partition 级别。

Kafka APIs

除了命令行的方式，Kafka有5个核心 APIs for Java and Scala：

The Admin API to manage and inspect topics, brokers, and other Kafka objects.
The Producer API to publish (write) a stream of events to one or more Kafka topics.
The Consumer API to subscribe to (read) one or more topics and to process the stream of events produced to them.
The Kafka Streams API to implement stream processing applications and microservices. it provides higher-level functions to process event streams, including transformations. stateful operations like aggregations and joins, windowing, processing based on event-time, and more. Input is read from one or more topics in order to generate output to one or more topics, effectively transforming the input streams to output streams.
The Kafka Connect API to build and run reusable data import/export connectors that consumes (read) or produce (write) streams of events from and to external systems and applications so they can integrate with kafka. For example, a connector to a relational database like postgreSQL might capture every change to a set of tables.

使用场景

Messaging
Website Activity Tracking
Metrcs
Log Aggregation
Stream Processing
Event Sourcing
Commit Log

Danielka

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Kafka基本概念

3个特点发布（写）和订阅（读）事件流，包括连续地将数据从其他系统输入和输出。可以如你想要的那样长时间的、可靠地、经久地存储事件流。在事件发生时或回顾时处理事件流。组成服务端一些服务端来自储存层，被称为brokers另一些运行 Kafka Connect来持续地输入和输出数据作为事件流使得Kafka可以与现有的如关系型数据库、或是其他kafka集群相整合。...
复制链接

扫一扫

专栏目录