Kafka:
Kafka是一个高吞吐量,分布式的发布—订阅消息系统。据kafka官网介绍,当前的kafka已经定位为一个分布流式处理平台,它可以水平扩展,也具有高吞吐量,越来越多开源分布式处理系统(Flume,Apache Storm,Spark)支持与kafka集成。
kafka是一个分布式消息列队,kafka对消息保存时根据topic进行归类,发送消息者称为producer,消息接收者称为consumer,此外kafka集群有多个kafka组成,每个实例(server)称为broker。
kafka集群依赖zookeeper,zookeeper会保存一些meta信息,来保证系统的可用性。
组件:
- kafka server:消息系统中间件,接收生产者产生的消息,接收消费者的订阅。
- Topic:用来存放数据,可以理解为一个列队。
- Broker:一台kafka服务器就是一个broker,是集群的节点,一个集群是由多个broker组成,一个broker可以容纳多个topic。
- Leader:一个分区就是一个Leader,用来写数据,响应消费者。
- Follower:用来备份leader的数据,可以处理读请求,如果leader挂了,follower会被选举为leader。
- Consumer group:kafka用来实现一个topic消息的广播和单播的手段,这是一个逻辑的消费者,每个组中有多个消费的实例,用来处理leader的消息,一个leader中的消息只能有每个消费者组中的一个实例处理,避免一个消息在消费者组中被重复执行。
- consumer:消息消费者,对kafka中的消息进行订阅。
- producer:消息生产者,用来生产消息。
- partition:存在消息的载体,类似于rabbitmq的queue。
- offset:偏移量,对分区的数据进行标识,如果消息被消费,信息仍然会在append log中临时保存。
- consumer id regitry:每个消费者都有自己的标识id,这个用来存储消费者的标识id。
- consumer offset tracking:用来追踪每个消费者消费的最大的offset。
- partition owner registry:用来标记partition被哪个消费者消费。
消息的传送机制:
- at most once:消息最多发送一次,发送一次,无论成败,不再重发。
- at least once:消息至少发送一次,如果没有成功则再次发送。
- exactly once:消息只发送一次。
流程图:
安装:
Host1:
[root@localhost ~]# tar -zxvf zookeeper-3.4.5.tar.gz -C /usr/src/
[root@localhost ~]# cd /usr/src/
[root@localhost src]# mv zookeeper-3.4.5 /usr/local/zookeeper
[root@localhost src]# cd /usr/local/zookeeper/conf/
[root@localhost conf]# cp zoo_sample.cfg zoo.cfg
[root@localhost conf]# vim zoo.cfg
修改:
dataDir=/usr/local/zookeeper/data
添加:
dataLogDir=/usr/local/zookeeper/datalog
server.1=192.168.43.176:2888:3888
server.2=192.168.43.104:2888:3888
server.3=192.168.43.23:2888:3888
配置文件说明:
# The number of milliseconds of each tick
tickTime=2000 #zookeeper集群中各个节点发送心跳包的间隔时间
# The number of ticks that the initial
# synchronization phase can take
initLimit=10 #当有新的follower加入,10*2000在这个事件内复制主上面的信息,单位个。
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5 #节点之间超时的等待时间。
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/usr/local/zookeeper/data #指定数据文件的存储路径
dataLogDir=/usr/local/zookeeper/datalog #指定数据日志的存储路径。
server.1=192.168.43.176:2888:3888 #指定集群中各个节点的信息,2888为节点间通信的接口,3888为节点之间选举leader的接口。
server.2=192.168.43.104:2888:3888
server.3=192.168.43.23:2888:3888
# the port at which the clients will connect
clientPort=2181 #指定zookeeper的端口。
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
~
Host1:
[root@localhost conf]# cd ../
[root@localhost zookeeper]# mkdir data
[root@localhost zookeeper]# mkdir datalog
[root@localhost zookeeper]# cd data
[root@localhost data]# echo "1" > myid
[root@localhost data]# scp -r /usr/local/zookeeper root@192.168.43.104:/usr/local/
[root@localhost data]# scp -r /usr/local/zookeeper root@192.168.43.23:/usr/local/
[root@localhost data]# iptables -F
[root@localhost data]# systemctl stop firewalld
[root@localhost data]# setenforce 0
[root@localhost data]# iptables-save
Host2:
[root@localhost ~]# cd /usr/local/zookeeper/data
[root@localhost data]# vim myid
2
[root@localhost data]# iptables -F
[root@localhost data]# systemctl stop firewalld
[root@localhost data]# setenforce 0
[root@localhost data]# iptables-save
Host2:
[root@localhost ~]# cd /usr/local/zookeeper/data
[root@localhost data]# vim myid
3
[root@localhost data]# iptables -F
[root@localhost data]# systemctl stop firewalld
[root@localhost data]# setenforce 0
[root@localhost data]# iptables-save
Host1:
[root@localhost data]# cd ../bin/
[root@localhost bin]# ./zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
Host2:
[root@localhost data]# cd ../bin/
[root@localhost bin]# ./zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
Host3:
[root@localhost data]# cd ../bin/
[root@localhost bin]# ./zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
Host1:
[root@localhost bin]# ./zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
Host2:
[root@localhost bin]# ./zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: leader
Host3:
[root@localhost bin]# ./zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
Host1:
[root@localhost ~]# tar -zxvf kafka_2.12-2.1.1.tgz -C /usr/src/
[root@localhost src]# mv kafka_2.12-2.1.1 /usr/local/kafka
[root@localhost src]# cd /usr/local/kafka/config/
[root@localhost config]# vim server.properties
修改:
broker.id=1 #kafka标识id。
listeners=PLAINTEXT://192.168.43.176:9092 #监听的端口号。
log.dirs=/usr/local/kafka/data #数据存放路径。
log.retention.hours=168 #分区消息保存的时间,单位秒。
添加:
message.max.byte=1024000 #生产者推送消息的最大容量。
default.replication.factor=2 #对leader备份的个数。
replica.fetch.max.bytes=102400 #消息单个消费的最大容量,单位字节。
修改:
zookeeper.connect=192.168.43.176:2181,192.168.43.104:2181,192.168.43.23:2181
#声明各个节点的zookeeper连接端口号。
num.rtition=1 #默认给topic中创建一个分区。
[root@localhost config]# mkdir ../data
[root@localhost config]# scp -r /usr/local/kafka root@192.168.43.104:/usr/local/
[root@localhost config]# scp -r /usr/local/kafka root@192.168.43.23:/usr/local/
Host2:
[root@localhost ~]# vim /usr/local/kafka/config/server.properties
修改:
broker.id=2
listeners=PLAINTEXT://192.168.43.104:9092
Host3:
[root@localhost ~]# vim /usr/local/kafka/config/server.properties
修改:
broker.id=3
listeners=PLAINTEXT://192.168.43.23:9092
Host1:
[root@localhost config]# cd ../bin/
[root@localhost bin]# ./kafka-server-start.sh -daemon ../config/server.properties
[root@localhost bin]# netstat -anput | grep 9092
tcp6 0 0 192.168.43.176:9092 :::* LISTEN 8434/java
tcp6 0 0 192.168.43.176:36610 192.168.43.176:9092 ESTABLISHED 8434/java
tcp6 0 0 192.168.43.176:9092 192.168.43.176:36610 ESTABLISHED 8434/java
Host2:
[root@localhost ~]# cd /usr/local/kafka/bin/
[root@localhost bin]# ./kafka-server-start.sh -daemon ../config/server.properties
[root@localhost bin]# netstat -anput | grep 9092
tcp6 0 0 192.168.43.104:9092 :::* LISTEN 8052/java
tcp6 0 0 192.168.43.104:9092 192.168.43.176:43400 ESTABLISHED 8052/java
Host3:
[root@localhost ~]# cd /usr/local/kafka/bin/
[root@localhost bin]# ./kafka-server-start.sh -daemon ../config/server.properties
[root@localhost bin]# netstat -anput | grep 9092
tcp6 0 0 192.168.43.23:9092 :::* LISTEN 5978/java
tcp6 0 0 192.168.43.23:9092 192.168.43.176:54410 ESTABLISHED 5978/java
Host1:
创建Topic:
[root@localhost bin]# ./kafka-topics.sh --create --zookeeper 192.168.43.176:2181 --partitions 2 --replication-factor 2 --topic topic
Created topic "topic".
查看创建的Topic:
[root@localhost bin]# ./kafka-topics.sh --list --zookeeper 192.168.43.176:2181
topic
查看topic的详细信息:
[root@localhost bin]# ./kafka-topics.sh --zookeeper 192.168.43.176 --describe
Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
Topic: __consumer_offsets Partition: 0 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 1 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 2 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 3 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 4 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 5 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 6 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 7 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 8 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 9 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 10 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 11 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 12 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 13 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 14 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 15 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 16 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 17 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 18 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 19 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 20 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 21 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 22 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 23 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 24 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 25 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 26 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 27 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 28 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 29 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 30 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 31 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 32 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 33 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 34 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 35 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 36 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 37 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 38 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 39 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 40 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 41 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 42 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 43 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 44 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 45 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 46 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 47 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 48 Leader: 3 Replicas: 3 Isr: 3
Topic: __consumer_offsets Partition: 49 Leader: 1 Replicas: 1 Isr: 1
Topic:topic PartitionCount:2 ReplicationFactor:2 Configs:
Topic: topic Partition: 0 Leader: 3 Replicas: 3,1 Isr: 3,1
Topic: topic Partition: 1 Leader: 1 Replicas: 1,2 Isr: 1,2
生产消息:
[root@localhost bin]# ./kafka-console-producer.sh --broker-list 192.168.43.104:9092 192.168.43.23:9092 --topic topic
Host2:
消费消息:
[root@localhost bin]# ./kafka-console-consumer.sh --bootstrap-server 192.168.43.104:9092 --topic topic --from-beginning
test
Host3:
消费消息:
[root@localhost bin]# ./kafka-console-consumer.sh --bootstrap-server 192.168.43.104:9092 --topic topic --from-beginning
test
命令说明:
Topic:
kafka-topics.sh
--create #表明创建一个topic
--zookeeper #KAFKA集群的创建依赖于zookeeper,同样topic的信息也要写入到zookeeper,这里声明zookeeper信息(本机的ip:2181端口号)
--partitions #指定分几个区。
--replication-factor #指定副本数,不能大于broker数。
--topic #指定topic的名字。
创建topic:
./kafka-topics.sh --create --zookeeper 本机ip:2181 --partitions 2 --replication-factor 2 --topic first
查看topic:
./kafka-topics.sh --list --zookeeper 本机ip:2181
查看topic的详细信息:
./kafka-topics.sh --zookeeper 本机ip:2181 --describe
删除topic:
./kafka-topics.sh --delete --zookeeper zookeeper集群地址 --topic 要删除的topic的名字
Producer命令:
kafka-console-consumer.sh
--broker-list 指定broker,端口号为9092
--topic 指定topic
Consumer命令:
kafka-console-consumer.sh
--zookeeper #指定zookeeper。
--topic #指定topic。
--from-beginning #消费从topic创建开始,所有生产的数据。
--bootstrap-sever #端口号是9092,指定bootstrap,因为Kafka本身会和Leader通信,如果offset信息存在在zookeeper中,会zookeeper频繁通信,影响效率。如果连接到的是bootstrap(kafka集群),会生成一个叫consumer_offsets的topic。