一、概念理解
kafka是最初由Linkedin公司开发,是一分布式,支持分区的(partition)、多副本的(replica),基于zookeeper协调的分布式消息系统,它的最大的特性就是可以实时的处理大量数据以满足各种需求场景:比如基于hadoop的批处理系统、低延迟的实时系统、storm/Spark流式处理引擎,Web/nginx日志,访问日志,消息服务等等,用scala语言编写,Linkedin于2010年贡献给了Apache基金会并成为顶级开源项目
1.1产生背景
当今社会各种应用系统诸如商业、社交、搜索、浏览等像信息工厂一样不断的生产处各种信息,在大数据时代,我们面临如下挑战:
-
如何收集这些巨大的信息
-
如何分析它
-
如何及时做到如上两点
1.2Kafka的特性
-
高吞吐量:低延迟:Kafka每秒可以处理几十万消息,它的延迟最低只有几毫秒
-
可扩展性:kafka集群支持热扩展
-
持久性、可靠性:消息被持久化到本地磁盘,并且支持数据备份防止数据丢失
-
容错性:允许集群中节点失败(若副本数量为n,则允许n-1个节点失败)
-
高并发:支持数钱个客户端同时读写
1.3kafka使用场景
日志收集:一个公司可以用kafka可以收集各种服务的log,通过kafka以统一接口服务的方式开放给各种sonsumer,例如Hadoop、Hbase、Solr等
消息系统:解耦和生产者和消费者,缓存消息等
用户活动跟踪:kafka经常被用来记录Web用户或者app用户的各种活动,如浏览网页,搜索,点击等活动,这些活动信息被各个服务器发布到kafka的topic中,然后订阅者通过订阅这些topic来做实时的监控分析,或者装载到hadoop,数据仓库中做离线分析和挖掘
运营指标:kafka也经常用来记录运营监控数据。包括收集各种分布式应用的数据,生产各种操作的集中反馈,比如报警和报告
流式处理:比如spark,streaming和storm
1.4点对点模式
如上图所示,点对点模式通常是基于拉取或者轮询的消息传送模型,这个模型的特点是发送到队列的消息被一个且只有一个消费者进行处理,生产者将消息放回消息队列后,由消费者主动的去拉取消息进行消费。
点对点的优点:消费者拉取消息的频率可以有自己进行控制,但是消息队列是否有消息需要消费,在消费者端无法感知,所以在消费者端需要额外的线程去控制及监控
1.5发布订阅模式
发布订阅模式:是一个基于消息送的消息传送模型,该模型可以有不同的订阅者。生产者将消息放入消息队列后,队列会将消息推送给订阅过该类消息的消费者(相当于公众号)。由消费者被动接受推送,所以无需感知消息队列是否还有待消费的消息,但是如果consumer节点的资源配置性能较差,则就会影响处理消息的能力,而且消息队列也无法感知消费者的速度。如果消息队列将推送速度定为5MB/s,consumer1为10M/s,consumer2为5M/s,consumer3为2M/s,那么consumer3就会无法承受,如果消息队列的推送速度为2M/s,则consumer1和consumer2则会出现资源浪费
1.6kafaka特点
优点:
-
可靠性强(分布式-分区-副本)
-
扩展性(可伸缩)
-
性能高(数据读写)
-
耐用性强(数据持久化)
-
时效性强
缺点:
-
由于是批量发送,数据并非真正的及时
-
仅支持统一分区内消息有序,无法实现全局化的消息有序
-
有可能消息重复消费
-
依赖zookeeper进行源数据管理
1.7kafka架构
架构(API):生产者、消费者、StreamAPI、ConnectAPI
流程生产者:kafka集群,消费者,zookeeper
1.8分区和消费组内的消费者之间的关系有哪些情况
Partition=消费任务的并发度=刚好,每个任务读取一个Partition
Partition>消费任务的并发度=有部分消费任务读取多个分区的数据
Partition<消费任务的并发度=有部分消费任务空闲(可以创建多于分区的消费者数量)
1.9分区数、消费者与读取效率之间的关系
分区数越多,同一时间可以有更多的消费来进行消费,消费数据的推送速度就会更快,提高消费的性能及效率
二、集群部署
主机名 | 主机IP | zookeeper版本 | kafka版本 |
---|---|---|---|
worker01 | 192.168.200.104 | apache-zookeeper-3.7.1 | kafka_2.12-3.3 |
worker02 | 192.168.200.105 | apache-zookeeper-3.7.1 | kafka_2.12-3.3 |
worker03 | 192.168.200.106 | apache-zookeeper-3.7.1 | kafka_2.12-3.3 |
一、环境配置
修改主机名(worker01、worker02、worker03执行)
[root@localhost ~]# hostnamectl set-hostname worker01 && bash
[root@localhost ~]# hostnamectl set-hostname worker02 && bash
[root@localhost ~]# hostnamectl set-hostname worker03 && bash
配置/etc/hosts(worker01、worker02、worker03执行)
[root@worker01 ~]# cat >> /etc/hosts << EOF
192.168.200.104 worker01
192.168.200.105 worker02
192.168.200.106 worker03
EOF
[root@worker02 ~]# cat >> /etc/hosts << EOF
192.168.200.104 worker01
192.168.200.105 worker02
192.168.200.106 worker03
EOF
[root@worker03 ~]# cat >> /etc/hosts << EOF
192.168.200.104 worker01
192.168.200.105 worker02
192.168.200.106 worker03
EOF
二、安装jdk1.8
worker01、worker02、worker03执行
# 查询jdk软件包
yum list |grep jdk
# 安装系统对应版本
yum install -y java-1.8.0-openjdk-devel.x86_64
# 查看jdk是否配置生效
java -version
# 后续可以用jps查看,主机启动的所有java进程
jps
三、安装zookeeper集群
3.1解压目录
#worker01
[root@worker01 ~]# mkdir -pv /root/software/kafaka-zookeeper/{data,logs}
mkdir: 已创建目录 "/root/software/"
mkdir: 已创建目录 "/root/software/kafaka-zookeeper"
mkdir: 已创建目录 "/root/software/kafaka-zookeeper/data"
mkdir: 已创建目录 "/root/software/kafaka-zookeeper/logs"
[root@worker01 ~]# tar -xvf /root/software/apache-zookeeper-3.7.1-bin.tar.gz -C /root/software/kafaka-zookeeper
[root@worker01 ~]# mv /root/software/kafaka-zookeeper/apache-zookeeper-3.7.1-bin/ /root/software/kafaka-zookeeper/apache-zookeeper
#worker02
[root@worker02 ~]# mkdir -pv /root/software/kafaka-zookeeper/{data,logs}
mkdir: 已创建目录 "/root/software/"
mkdir: 已创建目录 "/root/software/kafaka-zookeeper"
mkdir: 已创建目录 "/root/software/kafaka-zookeeper/data"
mkdir: 已创建目录 "/root/software/kafaka-zookeeper/logs"
[root@worker02 ~]# tar -xvf /root/software/apache-zookeeper-3.7.1-bin.tar.gz -C /root/software/kafaka-zookeeper
[root@worker02 ~]# mv /root/software/kafaka-zookeeper/apache-zookeeper-3.7.1-bin/ /root/software/kafaka-zookeeper/apache-zookeeper
#worker03
[root@worker03 ~]# mkdir -pv /root/software/kafaka-zookeeper/{data,logs}
mkdir: 已创建目录 "/root/software/"
mkdir: 已创建目录 "/root/software/kafaka-zookeeper"
mkdir: 已创建目录 "/root/software/kafaka-zookeeper/data"
mkdir: 已创建目录 "/root/software/kafaka-zookeeper/logs"
[root@worker03 ~]# tar -xvf /root/software/apache-zookeeper-3.7.1-bin.tar.gz -C /root/software/kafaka-zookeeper
[root@worker03 ~]# mv /root/software/kafaka-zookeeper/apache-zookeeper-3.7.1-bin/ /root/software/kafaka-zookeeper/apache-zookeeper
3.2修改配置文件
[root@worker01 ~ ]# cd /root/software/kafaka-zookeeper/apache-zookeeper/conf
[root@worker01 conf]# mv zoo_sample.cfg zoo.cfg
[root@worker01 conf]# vim /root/software/kafaka-zookeeper/apache-zookeeper/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/root/software/kafaka-zookeeper/data
dataLogDir=/root/software/kafaka-zookeeper/logs
server.1=192.168.200.104:2888:3888
server.2=192.168.200.105:2888:3888
server.3=192.168.200.106:2888:3888
[root@worker02 ~ ]# cd /root/software/kafaka-zookeeper/apache-zookeeper/conf
[root@worker02 conf]# mv zoo_sample.cfg zoo.cfg
[root@worker02 conf]# vim /root/software/kafaka-zookeeper/apache-zookeeper/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/root/software/kafaka-zookeeper/data
dataLogDir=/root/software/kafaka-zookeeper/logs
server.1=192.168.200.104:2888:3888
server.2=192.168.200.105:2888:3888
server.3=192.168.200.106:2888:3888
[root@worker03 ~ ]# cd /root/software/kafaka-zookeeper/apache-zookeeper/conf
[root@worker03 conf]# mv zoo_sample.cfg zoo.cfg
[root@worker03 conf]# vim /root/software/kafaka-zookeeper/apache-zookeeper/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/root/software/kafaka-zookeeper/data
dataLogDir=/root/software/kafaka-zookeeper/logs
server.1=192.168.200.104:2888:3888
server.2=192.168.200.105:2888:3888
server.3=192.168.200.106:2888:3888
3.3创建myid
[root@worker01 ~ ]# cd /root/software/kafaka-zookeeper/data
[root@worker01 data]# touch myid
[root@worker01 data]# echo 1 > /root/software/kafaka-zookeeper/data/myid
[root@worker02 ~ ]# cd /root/software/kafaka-zookeeper/data
[root@worker02 data]# touch myid
[root@worker02 data]# echo 2 > /root/software/kafaka-zookeeper/data/myid
[root@worker03 ~ ]# cd /root/software/kafaka-zookeeper/data
[root@worker03 data]# touch myid
[root@worker03 data]# echo 3 > /root/software/kafaka-zookeeper/data/myid
3.4启动停止zookeeper
[root@worker01 ~ ]# /root/software/kafaka-zookeeper/apache-zookeeper/bin/zkServer.sh start
[root@worker01 ~ ]# /root/software/kafaka-zookeeper/apache-zookeeper/bin/zkServer.sh status
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /root/software/kafaka-zookeeper/apache-zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
[root@worker02 ~ ]# /root/software/kafaka-zookeeper/apache-zookeeper/bin/zkServer.sh start
[root@worker02 ~ ]# /root/software/kafaka-zookeeper/apache-zookeeper/bin/zkServer.sh status
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /root/software/kafaka-zookeeper/apache-zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader
[root@worker03 ~ ]# /root/software/kafaka-zookeeper/apache-zookeeper/bin/zkServer.sh start
[root@worker03 ~ ]# /root/software/kafaka-zookeeper/apache-zookeeper/bin/zkServer.sh status
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /root/software/kafaka-zookeeper/apache-zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
# 运行客户端(后面介绍kafka注册)
/root/software/kafaka-zookeeper/apache-zookeeper/bin/zkCli.sh -server 192.168.200.104:2181
/root/software/kafaka-zookeeper/apache-zookeeper/bin/zkCli.sh -server 192.168.200.105:2181
/root/software/kafaka-zookeeper/apache-zookeeper/bin/zkCli.sh -server 192.168.200.106:2181
四、安装kafka集群
4.1解压目录
[root@worker01 ~]# cd /root/software/
[root@worker01 software]# ls
apache-zookeeper-3.7.1-bin.tar.gz kafaka-zookeeper kafka_2.12-3.3.1.tgz
[root@worker01 software]# mkdir -pv /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/{data,logs}
[root@worker01 software]# ls
apache-zookeeper-3.7.1-bin.tar.gz kafaka-zookeeper kafka_2.12-3.3.1.tgz
[root@worker01 software]# tar -xvf /root/software/kafka_2.12-3.3.1.tgz -C /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/
[root@worker01 software]# cd /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/
[root@worker01 kafka_2.12-3.3.1]# mv kafka_2.12-3.3.1/ kafka
[root@worker02 ~]# cd /root/software/
[root@worker02 software]# ls
apache-zookeeper-3.7.1-bin.tar.gz kafaka-zookeeper kafka_2.12-3.3.1.tgz
[root@worker02 software]# mkdir -pv /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/{data,logs}
[root@worker02 software]# ls
apache-zookeeper-3.7.1-bin.tar.gz kafaka-zookeeper kafka_2.12-3.3.1.tgz
[root@worker02 software]# tar -xvf /root/software/kafka_2.12-3.3.1.tgz -C /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/
[root@worker02 software]# cd /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/
[root@worker02 kafka_2.12-3.3.1]# mv kafka_2.12-3.3.1/ kafka
[root@worker03 ~]# cd /root/software/
[root@worker03 software]# ls
apache-zookeeper-3.7.1-bin.tar.gz kafaka-zookeeper kafka_2.12-3.3.1.tgz
[root@worker03 software]# mkdir -pv /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/{data,logs}
[root@worker03 software]# ls
apache-zookeeper-3.7.1-bin.tar.gz kafaka-zookeeper kafka_2.12-3.3.1.tgz
[root@worker03 software]# tar -xvf /root/software/kafka_2.12-3.3.1.tgz -C /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/
[root@worker03 software]# cd /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/
[root@worker03 kafka_2.12-3.3.1]# mv kafka_2.12-3.3.1/ kafka
4.2修改配置文件
[root@worker01 config]# cd /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/kafka/config
[root@worker01 config]# cp server.properties server.properties.bak
[root@worker01 config]# sed -i "/#/d" server.properties
[root@worker02 config]# cd /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/kafka/config
[root@worker02 config]# cp server.properties server.properties.bak
[root@worker02 config]# sed -i "/#/d" server.properties
[root@worker03 config]# cd /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/kafka/config
[root@worker03 config]# cp server.properties server.properties.bak
[root@worker03 config]# sed -i "/#/d" server.properties
vim server.properties
# worker01 192.168.200.104 配置文件修改
broker.id=1
listeners=PLAINTEXT://192.168.200.104:9092
num.network.threads=12
num.io.threads=24
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/root/software/kafaka-zookeeper/logs/
num.partitions=3
num.recovery.threads.per.data.dir=12
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=3
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=192.168.200.104:2181,192.168.200.105:2181,192.168.200.106:2181
zookeeper.connection.timeout.ms=18000
group.initial.rebalance.delay.ms=0
# worker02 192.168.19.131 配置文件修改
broker.id=2
listeners=PLAINTEXT://192.168.1200.105:9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/root/software/kafaka-zookeeper/logs/
num.partitions=3
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.retention.check.interval.ms=300000
zookeeper.connect=192.168.200.104:2181,192.168.200.105:2181,192.168.200.106:2181
zookeeper.connection.timeout.ms=18000
group.initial.rebalance.delay.ms=0
# worker03 192.168.19.132 配置文件修改
broker.id=3
listeners=PLAINTEXT://192.168.1200.106:9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/root/software/kafaka-zookeeper/logs/
num.partitions=3
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.retention.check.interval.ms=300000
zookeeper.connect=192.168.200.104:2181,192.168.200.105:2181,192.168.200.106:2181
zookeeper.connection.timeout.ms=18000
group.initial.rebalance.delay.ms=0
4.3启动kafka集群
[root@worker01 ~]# cd /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/kafka/bin/
[root@worker01 bin]# ./kafka-server-start.sh -daemon ../config/server.properties
[root@worker01 bin]# netstat -anpt | grep java
tcp6 0 0 :::24439 :::* LISTEN 2143/java
tcp6 0 0 :::18814 :::* LISTEN 1650/java
tcp6 1 0 192.168.200.105:9092 :::* LISTEN 2143/java
tcp6 0 0 :::2181 :::* LISTEN 1650/java
tcp6 0 0 192.168.200.105:2888 :::* LISTEN 1650/java
tcp6 0 0 192.168.200.105:3888 :::* LISTEN 1650/java
tcp6 0 0 :::8080 :::* LISTEN 1650/java
tcp6 0 20 192.168.200.105:2888 192.168.200.104:13934 ESTABLISHED 1650/java
tcp6 0 20 192.168.200.105:2888 192.168.200.106:18286 ESTABLISHED 1650/java
tcp6 0 0 192.168.200.105:3888 192.168.200.106:64502 ESTABLISHED 1650/java
tcp6 0 0 192.168.200.105:49596 192.168.200.104:2181 ESTABLISHED 2143/java
tcp6 0 0 192.168.200.105:7834 192.168.200.104:3888 ESTABLISHED 1650/java
[root@worker02 ~]# cd /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/kafka/bin/
[root@worker02 bin]# ./kafka-server-start.sh -daemon ../config/server.properties
[root@worker02 bin]# netstat -anpt | grep java
tcp6 0 0 :::24439 :::* LISTEN 2143/java
tcp6 0 0 :::18814 :::* LISTEN 1650/java
tcp6 1 0 192.168.200.105:9092 :::* LISTEN 2143/java
tcp6 0 0 :::2181 :::* LISTEN 1650/java
tcp6 0 0 192.168.200.105:2888 :::* LISTEN 1650/java
tcp6 0 0 192.168.200.105:3888 :::* LISTEN 1650/java
tcp6 0 0 :::8080 :::* LISTEN 1650/java
tcp6 0 20 192.168.200.105:2888 192.168.200.104:13934 ESTABLISHED 1650/java
tcp6 0 20 192.168.200.105:2888 192.168.200.106:18286 ESTABLISHED 1650/java
tcp6 0 0 192.168.200.105:3888 192.168.200.106:64502 ESTABLISHED 1650/java
tcp6 0 0 192.168.200.105:49596 192.168.200.104:2181 ESTABLISHED 2143/java
tcp6 0 0 192.168.200.105:7834 192.168.200.104:3888 ESTABLISHED 1650/java
[root@worker03 ~]# cd /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/kafka/bin/
[root@worker03 bin]# ./kafka-server-start.sh -daemon ../config/server.properties
[root@worker03 bin]# netstat -anpt | grep java
tcp6 0 0 :::24439 :::* LISTEN 2143/java
tcp6 0 0 :::18814 :::* LISTEN 1650/java
tcp6 1 0 192.168.200.105:9092 :::* LISTEN 2143/java
tcp6 0 0 :::2181 :::* LISTEN 1650/java
tcp6 0 0 192.168.200.105:2888 :::* LISTEN 1650/java
tcp6 0 0 192.168.200.105:3888 :::* LISTEN 1650/java
tcp6 0 0 :::8080 :::* LISTEN 1650/java
tcp6 0 20 192.168.200.105:2888 192.168.200.104:13934 ESTABLISHED 1650/java
tcp6 0 20 192.168.200.105:2888 192.168.200.106:18286 ESTABLISHED 1650/java
tcp6 0 0 192.168.200.105:3888 192.168.200.106:64502 ESTABLISHED 1650/java
tcp6 0 0 192.168.200.105:49596 192.168.200.104:2181 ESTABLISHED 2143/java
tcp6 0 0 192.168.200.105:7834 192.168.200.104:3888 ESTABLISHED 1650/java
4.4验证kafka集群
列出所有的topic
./kafka-topics.sh --list --bootstrap-server 192.168.200.104:9092
./kafka-topics.sh --list --bootstrap-server 192.168.200.105:9092
./kafka-topics.sh --list --bootstrap-server 192.168.200.106:9092
# 在worker01 192.168.200.104 上发布消息
./kafka-console-producer.sh --broker-list 192.168.200.104:9092 --topic zy
我过的挺好
zoo
# 在worker02 192.168.200.105 上消费
./kafka-console-producer.sh --broker-list 192.168.200.105:9092 --topic zy
我过的挺好
zoo
# 在worker02 192.168.200.106 上消费
./kafka-console-producer.sh --broker-list 192.168.200.106:9092 --topic zy
我过的挺好
zoo
4.5验证zookeeper集群
cd /root/software/kafaka-zookeeper/kafka_2.12-3.3.1/kafka/bin/
# 进入zookeeper客户端,如果是自定义端口一定要 -server 指定IP:port,否则默认进入2181端口
/root/software/kafaka-zookeeper/apache-zookeeper/bin/zkCli.sh -server 192.168.200.104:2181
/root/software/kafaka-zookeeper/apache-zookeeper/bin/zkCli.sh -server 192.168.200.105:2181
/root/software/kafaka-zookeeper/apache-zookeeper/bin/zkCli.sh -server 192.168.200.106:2181
# 查看服务
[zk: 192.168.200.104:2181(CONNECTED) 0] ls /brokers/ids
[1, 2, 3]
[zk: 192.168.200.104:2181(CONNECTED) 1] ls /brokers/topics
[zhangjialuo, zylinux]
# 进入zookeeper客户端,不是自定义
./zkCli.sh
[zk: localhost:2181(CONNECTED) 1] ls /
[admin, brokers, cluster, config, consumers, controller, controller_epoch, feature, isr_change_notification, latest_producer_id_block, log_dir_event_notification, zookeeper]
[zk: localhost:2181(CONNECTED) 2] ls /config
[brokers, changes, clients, ips, topics, users]
[zk: localhost:2181(CONNECTED) 3] ls /config/brokers
[]
[zk: localhost:2181(CONNECTED) 4] ls /config/topics
[__consumer_offsets, hkzs, zbqy]