参考资料
官网
《深入理解Kafka 核心设计与实践原理》朱忠华著
什么是Kafka
Apache Kafka® is a distributed streaming platform
A streaming platform has three key capabilities:
Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
Store streams of records in a fault-tolerant durable way.
Process streams of records as they occur.
Kafka is generally used for two broad classes of applications:
Building real-time streaming data pipelines that reliably get data between systems or applications
Building real-time streaming applications that transform or react to the streams of data
To understand how Kafka does these things, let's dive in and explore Kafka's capabilities from the bottom up.
First a few concepts:
Kafka is run as a cluster on one or more servers that can span multiple datacenters.
The Kafka cluster stores streams of records in categories called topics.
Each record consists of a key, a value, and a timestamp.
Kafka是一款分布式消息发布和订阅系统,它的特点是高性能、高吞吐量。
最早设计的目的是作为LinkedIn的活动流和运营数据的处理管道。这些数据主要是用来对用户做用户画
像分析以及服务器性能数据的一些监控所以kafka一开始设计的目标就是作为一个分布式、高吞吐量的消息系统,所以适合运用在大数据传输场景。如的开源分布式处理系统如cloudera 、Storm 、Spark、 Flink 等都支持与 Kafka 集成
Kafka 起初是由 Linkedin 公司采用
Scala
语言开发的 个多分区、多副本
且基于 ZooKeeper 协调的分布式消息系统
目前 Kafka 已经定位为一个分布式流式处理平台,它以高吞吐、可持久化、可水平扩展、支持流数据