Kafka Introduction

最新推荐文章于 2022-08-29 21:18:03 发布

BloodSweet

最新推荐文章于 2022-08-29 21:18:03 发布

阅读量186

点赞数

分类专栏： Kafka

本文链接：https://blog.csdn.net/equinux/article/details/80284951

版权

Kafka 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Kafka Introduction

Publish & Subscribe

Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies. companies.(Kafka用于构建实时，流处理应用。具有水平可扩展，容错，快速的特点，并且可以在生成中使用)

Store

Store streams of data safely in a distributed, replicated, fault-tolerant cluster.

可以安全的，多副本的，安全的存储在集群上。

我们可以把Kafka抽象为一个消息中间件(Buffer)。位于消息的产生者和消费者这间(消息可能有不同的产生者Producer和消费者Consumer)。

Kafka框架

producer: 生产者，测试机，生产大量的测试结果
consumer：消费者，YMS，分析存储这些测试结果
broker: 测试机台与YMS的中间件，是一个application。
topic：主题，给测试的结构带一个标签，topic_inline的测试文件，topic_dev的测试文件，topic_offline的测试文件

Introduction

Apache Kafka is a distributed streaming platform. Waht exactly does that mean?

Kafka是一个分布式流处理平台，有哪些特征？

A streaming platform has three key capabilities:

Publish and subscibe to streams of records, similar to a message queue or enterprise messaging system.(消息系统)
Store streams of records in a fault-tolerant durable way.(容错)
Process streams of records as they occur.(实时)

Kafka is generall used for two broad classes of applications:

Building real-time streaming data pipelines that reliably get data between systems or applications
Building real-time streaming applications that transform or react to the streams of data.

First a few concepts:

Kafka is run as a cluster on one or more servers(单个机器不够存储的话，就用多个机器来存储) that can span multiple datacenters.
The Kafka cluster stores streams of records in categories called topics.(Kafka的流可以根据topic分类)
Each record consists of a key, a value, and a timestamp(时间戳).

Kafka has four core APIs:

The Producer API allows an application to publish a stream of records to one or more Kafka topics.
The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
The Stream API allows an application to act as a stream processor, consuming an input stream from one or more topics, effectively transforming the input streams to output streams.
The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.

Topics and Logs

Let's first dive into the core abstraction Kafka provides for a stream of records - the topic.
A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.

Kafka的部署和使用

单节点单个broker部署和使用
单节点多个broker部署和使用
多节点多个broker部署和使用

配置单节点单个broker

Kafka的使用，需要用到ZooKeeper，所以在使用Kafa之前，你需要先安装Zookeeper

bash

export ZK_HOME = ....
export PATH=$ZK_HOME/bin:$PATH

配置zoo.cfg conf/zoo.cfg
数据存储的目录
dataDir= ...

启动zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties 或
bin/zkServer.sh start

jps可以看到
QuorumPeerMain进程表示启动ZK成功

zkCli.sh链接

#配置kafka
export KAFKA_HOME=...
export PATH=$KAFKA_HOME/BIN:$PATH

#配置 $KAFKA_HOME/config/server.properties
broker.id=0 # id of broker
listeners=PLAINTEXT://:9092 #监听的端口，默认在9092
host.name=localhost #当前机器
log.dirs=/tmp/kafka-logs #kafka日志文件存放目录
num.partition=1 #分区数量

zookeeper.connect=localhost:2181 #zookeeper地址


#启动kafka
bin/kafka-server-start.sh config/server.properties

jps / jps -m
多了一个kafka进程

#创建topic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic inline
创建topic时需要指定zookeeper地址，副本系数，分区数

#查看所有topic
bin/kafka-topics.sh --list --zookeeper localhost:2181

bin/kafka-topics.sh --describe --zookeeper localhost:2181

bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic inline

##发送消息，生产者生成消息
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic inline
This is a message
This is another message

#消费消息
                                          zookeeper端口
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic inline --from-beginning
This is a message
This is another message

--from-beginning 表示从头开始消费，无此参数则从消费启动后产生的数据

配置单节点多个broker

The broker.id property is the unique and permanent name of each node in the cluster

server-1.properties
	log.dirs=/home/hadoop/app/tmp/kafka-logs-1
	listeners=PLAINTEXT://:9093
	broker.id=1

server-2.properties
	log.dirs=/home/hadoop/app/tmp/kafka-logs-2
	listeners=PLAINTEXT://:9094
	broker.id=2

server-3.properties
	log.dirs=/home/hadoop/app/tmp/kafka-logs-3
	listeners=PLAINTEXT://:9095
	broker.id=3

#-daemon 后台启动

kafka-server-start.sh -daemon $KAFKA_HOME/config/server-1.properties &
kafka-server-start.sh -daemon $KAFKA_HOME/config/server-2.properties &
kafka-server-start.sh -daemon $KAFKA_HOME/config/server-3.properties &

kafka-topics.sh --create --zookeeper hadoop000:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic

kafka-console-producer.sh --broker-list hadoop000:9093,hadoop000:9094,hadoop000:9095 --topic my-replicated-topic
kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic my-replicated-topic



kafka-topics.sh --describe --zookeeper hadoop000:2181 --topic my-replicated-topic