Kafka-1-开始

最新推荐文章于 2024-02-14 11:36:50 发布

zhabit

最新推荐文章于 2024-02-14 11:36:50 发布

阅读量288

点赞数 1

分类专栏： Kafka Zookeeper 文章标签： kafka kafka集群搭建 zookeeper集群搭建 kafka入门 kafka原理

本文链接：https://blog.csdn.net/wen524/article/details/87916385

版权

Kafka 同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

Zookeeper

1 篇文章 0 订阅

订阅专栏

基本是官方文档翻译，不小心进来的请出门左转：kafka官方文档

Apache Kafka是什么？

Apache Kafka是一个分布式流处理平台

所以它至少具备流处理平台的三个核心能力：

流发布和订阅。类似于消息队列或企业消息系统
流持久化存储。以支持系统容错（fault-tolerant）
流处理。实时处理（as they occur）

Kafka 最常用的两种场景：

构建实时数据流管道，用于各系统数据集成。这一点与消息队列相似。
构建实时流处理应用，实时响应并转换数据流。这使得Kafka真正成为一个流处理平台

Kafka 核心API

以开发者的最简角度来看，Kafka的使用很简单，即四种API调用：

通过Producer API向Kafka发布数据。
通过Consumer API从Kafka读取数据。
通过Stream API进行流处理，将输入流经过处理后转换到输出流。
通过Connect API 与其他数据系统集成。

Topics&Logs

Kafka按topic来组织消息。生产者向特定的主题发布消息，消费者从特定的主题读取消息。

A topic is a category or feed name to which records are published

Kafka的Topic是多订阅者模式（multi-subscriber）的，每个topic都可以有0,1或多个消费者订阅消息。

Kafka使用分区日志（a partitioned log）来存储每个topic的消息。

Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log.

在这里插入图片描述
一个topic可以分成多个分区，每个分区对应一个log（structured commit log），它是有序的，不可修改只可追加的。

强烈推荐一篇文章，对理解Kafka为什么采用log存储消息具有很大帮助：Using logs to build a solid data infrastructure (or: why dual writes are a bad idea)

分区中的每条记录都赋予了一个唯一的序列号ID，叫做偏移（offset）。

注意：offset是基于partition的，所以kafka只能保证分区有序性，不能保证全局有序性。

消费者正是通过设置其偏移量来控制消费逻辑的，它可以将offset设置为任意值进行消费。也就是说，消费逻辑完全由消费者控制，这对于系统解耦和失败重试都很重要。

如图，消费者A和消费者B完全不用考虑双方的状态，各自独立。
在这里插入图片描述
为什么要分区？

使用分区主要有两个考虑：

扩充每个topic的容量。单个服务器支持的log大小有限，但对topic分区之后，partition可以分布到集群多台主机。
吞吐量。分区方便并行化，提高处理速度。

分布式

每个topic的partitions都会分布到整个集群的主机上，所有主机共享。
每个partition 还可以设置分片数量，这些分片（replicates）复制到整个集群以保证容错。
每个partition都有一个leader server和多个（或0个）follower server。
leader负责处理所有的读写请求，followers则被动地复制leader。
如果leader不能工作了，会有一个follower成为新的leader（选举）。

Kafka的集群不像HBase之类的集群，没有NameNode和DataNode之分。各节点相对平等。

Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.

通过partition logs和replicates，Kafka提供以下保证：

发送到同一个分区的消息是有序的，因为partition log是有序的，不可修改的。
消费者对这个分区消费时也是有序的。
分片数为N的topic，可以接受N-1台主机失败，而不会丢失数据。这个代价是生产者发布消息时必须保证所有的follower节点都同步完数据，从这一点来看，Kafka是CP的。

生产者（Producers）

由生产者选择将消息发布到哪个topic。
在内部，生产者还必须决定当前消息发布到哪个具体的分区。
可以使用轮询的方式来平衡各分区负载，也可以使用定制的分区函数（基于消息的key）。

消费者（Consumers）

Kafka的消费端是独立于生产端和Kafka broker的，可以用独立的进程实例来实现，如Java客户端。

Kafka使用消费组（consumer group）来组织消费者。

Each record published to a topic is delivered to one consumer instance within each subscribing consumer group.

对每个消费组，消息会广播到（broadcast）每个消费组。也就是说每个消费组都能消费到所有的分区。
对消费组中的消费者实例，消息会平均到（load balanced）各个消费者，也就是说，每个消费者只能消费一部分分区。Kafka 用partition数量除以消费者数量，所以每个消费者就像是“独占它平分到的那部分分区”。

这种分组机制是由Kafka动态维护的。如果新的消费者加入到该分组，它会分担一部分分区；如果一个消费者离开分组（挂了），它的分区会再次分布到其它消费者。
在这里插入图片描述

Quick Start

安装

tar -xzf kafka_2.11-2.1.0.tgz
cd kafka_2.11-2.1.0

启动

bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties

创建topic

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
bin/kafka-topics.sh --list --zookeeper localhost:2181

使用producer发送消息

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
This is a message
This is another message

使用consumer消费消息

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
This is a message
This is another message

搭建zookeeper集群

1、下载并解压

 tar -zxvf zookeeper-3.4.13.tar.gz

2、基本配置

cd zookeeper-3.4.13
cp conf/zoo-sample.cfg conf/zoo.cfg
vi conf/zoo.cfg

编辑文件内容如下并保存：

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial synchronization phase can take
initLimit=10
# The number of ticks that can pass between sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/var/data/zookeeper
# log directory
dataLogDir=/var/log/zookeeper
# the port at which the clients will connect
clientPort=2181
# cluster servers,server.id unique id
server.1=10.90.14.26:2888:3888
server.2=10.90.14.24:2888:3888
server.3=10.90.14.25:2888:3888
# the maximum number of client connections.
#increase this if you need to handle more clients
#maxClientCnxns=60
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

注意：如果dataDir和dataLogDir指定的目录不存在，则先创建目录

注意：server.id配置项这里配的是server.1，server.2，server.3，表示集群有三个节点，节点id分别是1,2,3 。所以要在每台主机上的数据目录中常见myid文件如下：

#节点1主机上
echo "1" > /var/data/zookeeper/myid
#节点2主机上
echo "2" >/var/data/zookeeper/myid
#节点3主机上
echo "3" > /var/data/zookeeper/myid

3、启动服务

#单台主机上都要执行
nohup bin/zkServer.sh start &

4、检查集群状态

bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.13/bin/../conf/zoo.cfg
Mode: follower

我这次实验中，第二个节点才是leader，不知道zookeeper具体的选举算法，以后研究。

搭建kafka集群

1、下载安装

tar -xzf kafka_2.11-2.1.0.tgz
cd kafka_2.11-2.1.0

2、最简配置
对每台主机上的 conf/server.properties作如下修改（其它配置暂时默认）：

主机1（10.90.14.26）：

broker.id=1
host.name=10.90.14.26
log.dirs=/var/data/kafka-logs
zookeeper.connect=10.90.14.24:2181,10.90.14.25:2181,10.90.14.26:2181

主机2（10.90.14.24）：

broker.id=2
host.name=10.90.14.24
log.dirs=/var/data/kafka-logs
zookeeper.connect=10.90.14.24:2181,10.90.14.25:2181,10.90.14.26:2181

主机3（10.90.14.25）：

broker.id=3
host.name=10.90.14.25
log.dirs=/var/data/kafka-logs
zookeeper.connect=10.90.14.24:2181,10.90.14.25:2181,10.90.14.26:2181

3、开启服务

# 三台主机都开启：
nohup bin/kafka-server-start.sh config/server.properties &

4、创建topic测试

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 -partitions 1 --topic test-topic
Created topic "test-topic".

bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test-topic
Topic:test-topic        PartitionCount:1        ReplicationFactor:3     Configs:
        Topic: test-topic       Partition: 0    Leader: 1       Replicas: 1,3,2 Isr: 1,3,2

5、producer和consumer

#随便选了一个broker
bin/kafka-console-producer.sh --broker-list 10.90.14.24:9092 --topic test-topic
>this is a test message
>this is another test message
^C

#随便选了一个broker
bin/kafka-console-consumer.sh --bootstrap-server 10.90.14.26:9092 --from-beginning --topic test-topic
this is a test message
this is another test message
^CProcessed a total of 2 messages