kafka学习笔记（二）简单安装

最新推荐文章于 2022-08-02 22:02:48 发布

每天进步一奈奈

最新推荐文章于 2022-08-02 22:02:48 发布

阅读量403

点赞数

分类专栏： kafka 文章标签： kafka

本文链接：https://blog.csdn.net/haogenmin/article/details/108821887

版权

kafka 专栏收录该内容

10 篇文章 3 订阅

订阅专栏

准备

准备三台虚拟机

192.168.10.12
192.168.10.13
192.168.10.14

安装jdk

每台设备安装

yum install java-1.8.0-openjdk

下载之后默认的目录为： /usr/lib/jvm/

验证

[root@localhost ~]# java -version
openjdk version "1.8.0_262"
OpenJDK Runtime Environment (build 1.8.0_262-b10)
OpenJDK 64-Bit Server VM (build 25.262-b10, mixed mode)

下载kafka

地址：https://kafka.apache.org/downloads

解压：tar -xzvf kafka_2.13-2.6.0.tgz

换个地改个名：mv kafka_2.13-2.6.0 /usr/local/kafka

kafka本身是带zookeeper的。

Zookeeper

kafka自带的Zookeeper程序脚本与配置文件名与原生Zookeeper稍有不同。

kafka自带的Zookeeper程序使用bin/zookeeper-server-start.sh，以及bin/zookeeper-server-stop.sh来启动和停止Zookeeper。

而Zookeeper的配制文件是config/zookeeper.properties，可以修改其中的参数

配置

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/kafka/zookeeper/data
dataLogDir=/usr/local/kafka/zookeeper/log
clientPort=2181
server.1=192.168.10.12:2888:3888
server.2=192.168.10.13:2888:3888
server.3=192.168.10.14:2888:3888

配置参数解释如下：

a). tickTime：该参数单位是毫秒ms，用于配置ZooKeeper中最小时间单元的长度，很多运行时的时间间隔都是使用tickTime的倍数来表示的。

b). initLimit：该参数要配置一个正整数N，表示tickTime的N倍。用于配置Leader服务器等待Follower启动，并完成数据同步的时间。Follower服务器在启动过程中，会与Leader建立连接并完成对数据的同步，从而确定自己对外提供服务的起始状态。Leader服务器允许Follower在initLimit时间内完成这个工作。

c). syncLimit：该参数要配置一个正整数N，表示tickTime的N倍。用于配置Leader服务器和Follower之间进行心跳检测的最大延时时间。如果Leader服务器在syncLimit时间内无法获取到Follower的心跳检测响应，那么Leader就会认为该Follower已经脱离了和自己的同步。

d). dataDir：用于配置ZooKeeper服务器存储快照文件的目录。默认情况下，如果没有配置参数dataLogDir，那么事务日志也会存储在这个目录中。考虑到事务日志的写性能直接影响ZooKeeper整体的服务能力，因此建议同时通过参数dataLogDir来配置ZooKeeper的事务日志的存储目录。

e). dataLogDir：用于配置ZooKeeper服务器存储事务日志文件的目录。dataDir和dataLogDir都要确保有读写权限。

f). clientPort：用于配置当前服务器对外的服务端口，客户端会通过该端口和ZooKeeper服务器创建连接。

g). server.id=host:port:port：该参数用于配置组成ZooKeeper集群的机器列表，其中id即为ServerID，与每台服务器myid文件中的数字相对应。同时，在该参数中，会配置两个端口：第一个端口用于指定Follower服务器与Leader进行运行时通信和数据同步时所使用的端口，第二个端口则专门用于进行Leader选举过程中的投票通信。在ZooKeeper服务器启动的时候，其会根据myid文件中配置的ServerID来确定自己是哪台服务器，并使用对应配置的端口来时行启动。如果在实际使用过程中，需要在同一台服务器上部署多个ZooKeeper实例来构成伪集群的话，那么这些端口都需要配置成不同。

创建myid文件

在dataDir所配置的目录下，创建一个名为myid的文件，在该文件的第一行写上一个数字，即ServerID，和zoo.cfg中当前机器的编号对应上。例如，server.1的myid文件内容就是“1”。要确保每个服务器的myid文件中的数字不同，并且和自己所在机器的zoo.cfg中的server.id=host:port:port的id值一致。id的范围是1~255。

启动Zookeeper命令

bin/zookeeper-server-start.sh -daemon config/zookeeper.properties

加-daemon参数，可以在后台启动Zookeeper，输出的信息在保存在执行目录的logs/zookeeper.out文件中。

关闭Zookeeper命令

bin/zookeeper-server-stop.sh -daemon config/zookeeper.properties

kafka配置

kafka的配置文件在config/server.properties文件中

1， 修改server.properties
#broker的全局唯一编号，不能重复
broker.id=1

#用来监听链接的ip端口，producer或consumer将在此端口建立连接
listeners=PLAINTEXT://192.168.10.12:9092
advertised.listeners=PLAINTEXT://192.168.10.12:9092

#处理网络请求的线程数量
num.network.threads=3

#用来处理磁盘IO的线程数量
num.io.threads=8


#发送套接字的缓冲区大小
socket.send.buffer.bytes=102400

#接受套接字的缓冲区大小
socket.receive.buffer.bytes=102400

#请求套接字的缓冲区大小
socket.request.max.bytes=104857600

#kafka消息存放的路径
log.dirs=/usr/local/kafka/kafka-logs

#topic在当前broker上的分片个数
num.partitions=3

#用来恢复和清理data下数据的线程数量
num.recovery.threads.per.data.dir=1

#segment文件保留的最长时间，超时将被删除
log.retention.hours=168

#滚动生成新的segment文件的最大时间
log.roll.hours=168

#日志文件中每个segment的大小，默认为1G
log.segment.bytes=1073741824

#周期性检查文件大小的时间
log.retention.check.interval.ms=300000

#日志清理是否打开
log.cleaner.enable=true

#broker需要使用zookeeper保存meta数据
zookeeper.connect=192.168.10.12:2181,192.168.10.13:2181,192.168.10.14:2181

#zookeeper链接超时时间
zookeeper.connection.timeout.ms=6000

#partion buffer中，消息的条数达到阈值，将触发flush到磁盘
log.flush.interval.messages=10000

#消息buffer的时间，达到阈值，将触发flush到磁盘
log.flush.interval.ms=3000

#删除topic需要server.properties中设置delete.topic.enable=true否则只是标记删除
delete.topic.enable=true

group.initial.rebalance.delay.ms=0

#复制因子
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=3
default.replication.factor=3

生产者

vim config/producer.properties 

#把集群里的服务告知生产者
bootstrap.servers=192.168.10.12:9092,192.168.10.13:9092,192.168.10.14:9092

消费者

# list of brokers used for bootstrapping knowledge about the rest of the cluster
# format: host1:port1,host2:port2 ...
bootstrap.servers=192.168.10.12:9092,192.168.10.13:9092,192.168.10.14:9092

# consumer group id
group.id=haogenmin

发现一个问题最新版本里面不需要设置zookeeper,消费者不再向zookeeper注册了。

把kafaka复制到另外两台机器。

scp -r /usr/local/kafka/ 192.168.10.13:/usr/local
scp -r /usr/local/kafka/ 192.168.10.14:/usr/local

分别修改服务id,zookeeper的id,以及服务的ip。

启动

启动zookeeper

三台都启动

bin/zookeeper-server-start.sh -daemon config/zookeeper.properties

启动Kafka

三台都启动

bin/kafka-server-start.sh -daemon config/server.properties

创建一个topic

bin/kafka-topics.sh --create --zookeeper 192.168.10.12:2181 --replication-factor 2 --partitions 3 --topic TEST_HGM

查看topic

bin/kafka-topics.sh --list --zookeeper 192.168.10.12:2181

查看分片情况

bin/kafka-topics.sh --describe --zookeeper 192.168.10.12:2181

Topic: TEST_HGM PartitionCount: 3       ReplicationFactor: 2    Configs: 
        Topic: TEST_HGM Partition: 0    Leader: 3       Replicas: 3,1   Isr: 3,1
        Topic: TEST_HGM Partition: 1    Leader: 1       Replicas: 1,2   Isr: 1,2
        Topic: TEST_HGM Partition: 2    Leader: 2       Replicas: 2,3   Isr: 2,3

“leader”是负责给定分区所有读写操作的节点。每个节点都是随机选择的部分分区的领导者。
“replicas”是复制分区日志的节点列表，不管这些节点是leader还是仅仅活着。
“isr”是一组“同步”replicas，是replicas列表的子集，它活着并被指到leader。

生产消息

bin/kafka-console-producer.sh --broker-list 192.168.10.12:9092 --topic TEST_HGM

消费消息

bin/kafka-console-consumer.sh --bootstrap-server 192.168.10.12:9092  --topic TEST_HGM

容错性，关闭192.168.10.13节点上的kafka服务。

bin/kafka-server-stop.sh -daemon config/server.properties

Topic: TEST_HGM PartitionCount: 3       ReplicationFactor: 2    Configs: 
        Topic: TEST_HGM Partition: 0    Leader: 3       Replicas: 3,1   Isr: 3,1
        Topic: TEST_HGM Partition: 1    Leader: 1       Replicas: 1,2   Isr: 1
        Topic: TEST_HGM Partition: 2    Leader: 3       Replicas: 2,3   Isr: 3

可以看到分区2号的leader变为了3号主机。

再重启2号主机，发现2号主机又加入了进去，不过不作为主节点，不过过一段时间，会重新分配。

Topic: TEST_HGM PartitionCount: 3       ReplicationFactor: 2    Configs: 
        Topic: TEST_HGM Partition: 0    Leader: 3       Replicas: 3,1   Isr: 3,1
        Topic: TEST_HGM Partition: 1    Leader: 1       Replicas: 1,2   Isr: 1,2
        Topic: TEST_HGM Partition: 2    Leader: 3       Replicas: 2,3   Isr: 3,2

使用Kafka Connect来导入导出数据

从控制台读出数据并将其写回是十分方便操作的，但你可能需要使用其他来源的数据或将数据从Kafka导出到其他系统。针对这些系统，你可以使用Kafka Connect来导入或导出数据，而不是写自定义的集成代码。

Kafka Connect是Kafka的一个工具，它可以将数据导入和导出到Kafka。它是一种可扩展工具，通过运行connectors（连接器），使用自定义逻辑来实现与外部系统的交互。在本文中，我们将看到如何使用简单的connectors来运行Kafka Connect，这些connectors 将文件中的数据导入到Kafka topic中，并从中导出数据到一个文件。

首先，我们将创建一些种子数据来进行测试，就是创建一个文件，为啥是test.txt?默认配置里面的：

> echo -e "foo\nbar" > test.txt

接下来，我们将启动两个standalone（独立）运行的连接器，这意味着它们各自运行在一个单独的本地专用进程上。我们提供三个配置文件。首先是Kafka Connect的配置文件，包含常用的配置，如Kafka brokers连接方式和数据的序列化格式。其余的配置文件均指定一个要创建的连接器。这些文件包括连接器的唯一名称，类的实例，以及其他连接器所需的配置。

> bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties

首先看一下默认配置文件。

connect-standalone.properties

这里面改一下ip:port

# These are defaults. This file just demonstrates how to override some settings.
bootstrap.servers=192.168.10.12:9092,192.168.10.13:9092,192.168.10.14:9092

connect-file-source.properties

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=test.txt
topic=connect-test

connect-file-sink.properties

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name=local-file-sink
connector.class=FileStreamSink
tasks.max=1
file=test.sink.txt
topics=connect-test

这些包含在Kafka中的示例配置文件使用您之前启动的默认本地群集配置，并创建两个连接器：第一个是源连接器，用于从输入文件读取行，并将其输入到 Kafka topic。第二个是接收器连接器，它从Kafka topic中读取消息，并在输出文件中生成一行。

在启动过程中，你会看到一些日志消息，包括一些连接器正在实例化的指示。一旦Kafka Connect进程启动，源连接器就开始从 test.txt 读取行并且将它们生产到主题 connect-test 中，同时接收器连接器也开始从主题 connect-test 中读取消息，并将它们写入文件 test.sink.txt 中。我们可以通过检查输出文件的内容来验证数据是否已通过整个pipeline进行交付：

[root@localhost kafka]# cat test.sink.txt 
foo
bar

数据存储在Kafka topic connect-test 中，因此我们也可以运行一个console consumer（控制台消费者）来查看 topic 中的数据（或使用custom consumer（自定义消费者）代码进行处理）：

bin/kafka-console-consumer.sh --bootstrap-server 192.168.10.12:9092 --topic connect-test --from-beginning

我们向文件增加新的数据

[root@localhost kafka]# echo Another line>> test.txt
[root@localhost kafka]# echo 我喜欢后弦>> test.txt

消费者接受到了数据

[root@localhost kafka]# bin/kafka-console-consumer.sh --bootstrap-server 192.168.10.12:9092 --topic connect-test --from-beginning
{"schema":{"type":"string","optional":false},"payload":"foo"}
{"schema":{"type":"string","optional":false},"payload":"bar"}
{"schema":{"type":"string","optional":false},"payload":"Another line"}
{"schema":{"type":"string","optional":false},"payload":"我喜欢后弦"}