Kafka搭建
Kafka核心概念
- Kafka是一种高吞吐量的分布式发布订阅消息系统,它可以处理消费者在网站中的所有动作流数据。
- Kafka的目的是通过Hadoop的并行加载机制来统一线上和离线的消息处理,也是为了通过集群来提供实时的消息。
Broker
Kafka集群包含一个或多个服务器,这种服务器被称为broker
Topic
每条发布到Kafka集群的消息都有一个类别,这个类别被称为Topic。(物理上不同Topic的消息分开存储,逻辑上一个Topic的消息虽然保存于一个或多个broker上但用户只需指定消息的Topic即可生产或消费数据而不必关心数据存于何处)
Partition
Partition是物理上的概念,每个Topic包含一个或多个Partition.
Producer
负责发布消息到Kafka broker
Consumer
消息消费者,向Kafka broker读取消息的客户端。
Consumer Group
每个Consumer属于一个特定的Consumer Group(可为每个Consumer指定group name,若不指定group name则属于默认的group)。
Kafka安装
- 省略zookeeper安装步骤;
- Kaka配置文件修改
# 添加环境变量
vim ~/.bashrc
export KAFKA_HOME=/home/hadoop/kafka_2.11-2.3.1
export PATH=$PATH:$JAVA_HOME/bin:$FLUME_HOME/bin:$KAFKA_HOME/bin
# 环境变量生效
source ~/.bashrc
# 修改Kafka文件
vim server.properties
# 主要修改参数
# 通过 brokerid 可以实现一台多broker
broker.id=0
# 对应 broker 端口号
listeners=PLAINTEXT://:9092
# 日志目录,注意tmp目录空间大小
log.dirs=/tmp/kafka-logs
使用之前修改的配置文件,后台启动kafka
kafka-server-start.sh server.properties &
jps -m
创建Topic
[root@node1 config]# kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic mytopic
[2020-01-03 10:27:32,680] INFO Creating topic mytopic with configuration {} and initial partition assignment Map(0 -> ArrayBuffer(0)) (kafka.zk.AdminZkClient)
[2020-01-03 10:27:32,785] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions Set(mytopic-0) (kafka.server.ReplicaFetcherManager)
[2020-01-03 10:27:32,793] INFO [Log partition=mytopic-0, dir=/home/hadoop/kafka_2.11-2.3.1/logs/kafka-logs] Loading producer state till offset 0 with message format version 2 (kafka.log.Log)
[2020-01-03 10:27:32,795] INFO [Log partition=mytopic-0, dir=/home/hadoop/kafka_2.11-2.3.1/logs/kafka-logs] Completed load of log with 1 segments, log start offset 0 and log end offset 0 in 7 ms (kafka.log.Log)
[2020-01-03 10:27:32,797] INFO Created log for partition mytopic-0 in /home/hadoop/kafka_2.11-2.3.1/logs/kafka-logs with properties {compression.type -> producer, message.downconversion.enable -> true, min.insync.replicas -> 1, segment.jitter.ms -> 0, cleanup.policy -> [delete], flush.ms -> 9223372036854775807, segment.bytes -> 1073741824, retention.ms -> 604800000, flush.messages -> 9223372036854775807, message.format.version -> 2.3-IV1, file.delete.delay.ms -> 60000, max.compaction.lag.ms -> 9223372036854775807, max.message.bytes -> 1000012, min.compaction.lag.ms -> 0, message.timestamp.type -> CreateTime, preallocate -> false, min.cleanable.dirty.ratio -> 0.5, index.interval.bytes -> 4096, unclean.leader.election.enable -> false, retention.bytes -> -1, delete.retention.ms -> 86400000, segment.ms -> 604800000, message.timestamp.difference.max.ms -> 9223372036854775807, segment.index.bytes -> 10485760}. (kafka.log.LogManager)
[2020-01-03 10:27:32,798] INFO [Partition mytopic-0 broker=0] No checkpointed highwatermark is found for partition mytopic-0 (kafka.cluster.Partition)
[2020-01-03 10:27:32,798] INFO Replica loaded for partition mytopic-0 with initial high watermark 0 (kafka.cluster.Replica)
[2020-01-03 10:27:32,798] INFO [Partition mytopic-0 broker=0] mytopic-0 starts at Leader Epoch 0 from offset 0. Previous Leader Epoch was: -1 (kafka.cluster.Partition)
查看 topic
[root@node1 bin]# kafka-topics.sh --list --bootstrap-server localhost:9092
mytopic
删除 topic
kafka-topics.sh --delete --zookeeper localhost:2181 --topic test
启动 producer,并且输入消息
kafka-console-producer.sh --broker-list localhost:9092 --topic mytopic
启动 consumer
[root@node1 opt]# kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic mytopic --from-beginning
aa
vv
以上 单台单broker 搭建完成
Setting up a multi-broker cluster
> cp config/server.properties config/server-1.properties
> cp config/server.properties config/server-2.properties
# 修改对应配置文件如下:
config/server-1.properties:
broker.id=1
listeners=PLAINTEXT://:9093
log.dirs=/tmp/kafka-logs-1
config/server-2.properties:
broker.id=2
listeners=PLAINTEXT://:9094
log.dirs=/tmp/kafka-logs-2
# 分别启动
> bin/kafka-server-start.sh config/server-1.properties &
...
> bin/kafka-server-start.sh config/server-2.properties &
...
[root@node1 config]#jps -m
30833 Kafka server.properties
3795 Kafka server-2.properties
28345 DFSZKFailoverController
3417 Kafka server-1.properties
创建 topic
kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --partitions 1 --topic my-replicated-topic
查看 topic
[root@node1 config]# kafka-topics.sh --list --bootstrap-server localhost:9092
__consumer_offsets
my-replicated-topic
mytopic
查看详细信息
# 1个分区,3个副本
# 主副本在id=2的broker上
# 副本保存在id= 2,0,1上
# Isr():存活borker id 2,0,1
[root@node1 config]# kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic my-replicated-topic
Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs:segment.bytes=1073741824
Topic: my-replicated-topic Partition: 0 Leader: 2 Replicas: 2,0,1 Isr: 2,0,1
[root@node1 config]#
- “leader” is the node responsible for all reads and writes for the given partition. Each node will be the leader for a randomly selected portion of the partitions.
- “replicas” is the list of nodes that replicate the log for this partition regardless of whether they are the leader or even if they are currently alive.
- “isr” is the set of “in-sync” replicas. This is the subset of the replicas list that is currently alive and caught-up to the leader.
创建 producer
[root@node1 config]# kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic
>aaa
创建 consumer
开启多个窗口,可以看到相同信息
[root@node1 opt]# kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic
aaa
[root@node1 ~]# kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic
aaa
以上 单台多broker搭建完成
fault-tolerance
[root@node1 config]#jps -m
30833 Kafka server.properties
3795 Kafka server-2.properties
28345 DFSZKFailoverController
3417 Kafka server-1.properties
[root@node1 ~]# kill -9 3417
[root@node1 ~]# kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic my-replicated-topic
Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs:segment.bytes=1073741824
Topic: my-replicated-topic Partition: 0 Leader: 2 Replicas: 2,0,1 Isr: 2,0
[root@node1 ~]#
# kafka依旧可以传输消息
[root@node1 config]# kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic
>default
>
[root@node1 opt]# kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic
aaa
aa
default