Confluent6.0平台搭建
大家好,我是一拳就能打爆A柱的男人
我搭Confluent的时候也遇上很多问题,所以我也不希望各位把坑都踩一遍,所以给大家带来这篇搭建流程。大家一定要多看官方的文档,里面文件虽然很乱,但是确实有整体的搭建流程。我建议各位一边看这篇博客,一边搭配官方文档来做!
1. 环境介绍
版本 | 其他 | |
---|---|---|
Linux | CentOS 6.10 | 内存:2GB |
JDK | 1.8.0_141 |
2. Confluent6.0下载
进入Confluent官网 点击右上角GET STARTED FREE进入下载页面,下拉至Download Confluent Platform(如图1),输入Email、选择Manual、选择File Type为tar下载。
将压缩包confluent-上传至服务器。
2. 解压、修改配置文件
在Linux中查看路径下文件:
[root@spark-03 apps]# ls
confluent-6.0.0.tar.gz
解压改文件到当前目录:
[root@spark-03 apps]# tar -zxvf confluent-6.0.0.tar.gz
进入confluent-6.0.0:
[root@spark-03 apps]# cd confluent-6.0.0/
查看confluent目录结构:
[root@spark-03 confluent-6.0.0]# ls
bin etc lib README share src
配置kafka、zookeeper的路径
在目录下新建data文件夹:
[root@spark-03 confluent-6.0.0]# mkdir data
[root@spark-03 confluent-6.0.0]# cd data
[root@spark-03 confluent-6.0.0]# mkdir zkdata kafkadata
[root@spark-03 data]# ls
kafkadata zkdata
复制zkdata文件夹路径,修改…/confluent-6.0.0/etc/kafka/zookeeper.properties文件:
[root@spark-03 confluent-6.0.0]# cd etc/kafka
[root@spark-03 kafka]# vi zookeeper.properties
进入zookeeper.properties:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# the directory where the snapshot is stored.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
maxClientCnxns=0
# Disable the adminserver by default to avoid port conflicts.
# Set the port to something non-conflicting if choosing to enable this
admin.enableServer=false
# admin.serverPort=8080
修改dataDir的路径:
dataDir=/.../confluent-6.0.0/data/zkdata
修改server.properties文件:
[root@spark-03 kafka]# vi server.properties
查找log.dir(或者手动找):
:g log.dir
修改对应路径:
log.dirs=/.../confluent-6.0.0/data/kafkadata
3. 启动Confluent Platform
进入Confluent的bin目录,启动Confluent(这里虽然会报启动失败,但是查看进程却发现进程成功启动了,所以提示进程启动失败可以查看是否真正启动成功,若成功则重复以上命令将所有进程依次启动完毕):
- 第一次启动失败:
[root@spark-03 bin]# ./confluent local services start
The local commands are intended for a single-node development environment only,
NOT for production usage. https://docs.confluent.io/current/cli/index.html
Using CONFLUENT_CURRENT: /tmp/confluent.ktr9tuJJ
Starting ZooKeeper
Error: ZooKeeper failed to start
- 查看进程:
[root@spark-03 bin]# jps
1445 QuorumPeerMain
1494 Jps
- 第二次启动zookeeper成功,kafka失败:
[root@spark-03 bin]# ./confluent local services start
The local commands are intended for a single-node development environment only,
NOT for production usage. https://docs.confluent.io/current/cli/index.html
Using CONFLUENT_CURRENT: /tmp/confluent.727822
ZooKeeper is [UP]
Starting Kafka
Error: Kafka failed to start
- 查看进程:
[root@spark-03 bin]# jps
1536 Kafka
1699 Jps
1445 QuorumPeerMain
如此反复直至所有进程启动(整个过程可能需要十几分钟):
[root@spark-03 bin]# ./confluent local services start
The local commands are intended for a single-node development environment only,
NOT for production usage. https://docs.confluent.io/current/cli/index.html
Using CONFLUENT_CURRENT: /tmp/confluent.727822
ZooKeeper is [UP]
Kafka is [UP]
Schema Registry is [UP]
Kafka REST is [UP]
Connect is [UP]
ksqlDB Server is [UP]
Control Center is [UP]
这些进程就是整个Confluent Platform完全打开的状态,可是我电脑比较差也开不起那么多进程,所以以下几个进程是可以关闭的,如果是做单机kafka的话:
- 查看内存(建议虚拟机开到3G以上)
[root@spark-03 bin]# free -h
total used free shared buffers cached
Mem: 1.9G 1.9G 59M 0B 1.8M 28M
-/+ buffers/cache: 1.9G 89M
Swap: 1.0G 544M 479M
- 查看进程号,建议砍掉ConnectDistributed、KsqlServerMain、ControlCenter:
[root@spark-03 bin]# jps
1536 Kafka
1873 ConnectDistributed
1810 KafkaRestMain
1986 KsqlServerMain
1445 QuorumPeerMain
2585 Jps
1741 SchemaRegistryMain
2063 ControlCenter
[root@spark-03 bin]# kill 1873 1986 2063
4. 配置Confluent-connectors
Confluent提供了上百种数据源的连接器(connectors),而要连接达梦8也是类似JDBC的连接方法,所以进入JDBC Connector (Source and Sink) for Confluent Platform 按照官方流程下载JDBC的connector。其实Confluent的组件所以也不一定要按照官方的办法通过confluent-hub来操作,也可以从confluent里下载connector然后上传到服务器。
4.1和4.2二选一
4.1 通过confluent-hub下载jdbc-connector
Confluent6.0平台已经带有confluent-hub,在bin目录中输入命令:
[root@spark-03 bin]# ./confluent-hub install confluentinc/kafka-connect-jdbc:latest
# 注:亦可指定版本kafka-connect-jdbc:10.0.0
# 选择安装目录,1是根据CONFLUENT_HOME 2是根据平台安装的路径 这里二者都是一样的
The component can be installed in any of the following Confluent Platform installations:
1. /.../confluent-6.0.0 (based on $CONFLUENT_HOME)
2. /.../confluent-6.0.0 (where this tool is installed)
Choose one of these to continue the installation (1-2): 1
Do you want to install this into /.../confluent-6.0.0/share/confluent-hub-components? (yN) y
Component's license:
Confluent Community License
https://www.confluent.io/confluent-community-license
I agree to the software license agreement (yN) y
Downloading component Kafka Connect JDBC 10.0.1, provided by Confluent, Inc. from Confluent Hub and installing into /.../confluent-6.0.0/share/confluent-hub-components
Detected Worker's configs:
1. Standard: /.../confluent-6.0.0/etc/kafka/connect-distributed.properties
2. Standard: /.../confluent-6.0.0/etc/kafka/connect-standalone.properties
3. Standard: /.../confluent-6.0.0/etc/schema-registry/connect-avro-distributed.properties
4. Standard: /.../confluent-6.0.0/etc/schema-registry/connect-avro-standalone.properties
5. Based on CONFLUENT_CURRENT: /tmp/confluent.727822/connect/connect.properties
# 选择是否更改1~5个文件的path
Do you want to update all detected configs? (yN) y
Adding installation directory to plugin path in the following files:
/.../confluent-6.0.0/etc/kafka/connect-distributed.properties
/.../confluent-6.0.0/etc/kafka/connect-standalone.properties
/.../confluent-6.0.0/etc/schema-registry/connect-avro-distributed.properties
/.../confluent-6.0.0/etc/schema-registry/connect-avro-standalone.properties
/tmp/confluent.727822/connect/connect.properties
Completed
以上是使用confluent-hub安装的jdbc-connector的过程。
4.2 通过官网下载jdbc-connector
也可以通过官网安装jdbc-connector,进入confluent-hub,在搜索框中搜索jdbc:
下载完成后上传至服务器即可。
4.3 配置jdbc-connector相关文件和路径
进入/…/confluent-6.0.0/share/confluent-hub-components查看jdbc-connector:
[root@spark-03 confluent-hub-components]# ls
confluentinc-kafka-connect-jdbc
进入文件夹,查看目录:
[root@spark-03 confluentinc-kafka-connect-jdbc]# ls
assets doc etc lib manifest.json
- lib目录依赖包路径,也是之后存放DM8connector的路径
- etc目录是相关配置文件文件夹
接下来首先要确定依赖路径正确,进入/…/confluent-6.0.0/etc/schema-registry:
4.3.1 配置依赖路径
查看connect-avro-standalone.properties,确保其plugin.path包括了/…/confluent-6.0.0/share/confluent-hub-components,如下:
plugin.path=share/java,/.../confluent-6.0.0/share/confluent-hub-components
也可以更详细的指定到lib中。
4.3.2 配置数据源相关配置
进入/…/confluent-6.0.0/share/confluent-hub-components/confluentinc-kafka-connect-jdbc/etc,拷贝source文件:
[root@spark-03 etc]# cp source-quickstart-sqlite.properties source-dm8.properties
修改source-dm8.properties参数:
# name必须唯一
name=test-source-dm-jdbc
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:dm://IP:5236/jc?user=用户名&password=密码&characterEncoding=utf-8
# 需要查询的表
table.whitelist=kmeans2
# 增量查询
mode=incrementing
# 增量依据
incrementing.column.name=id
# 自动生成的topic的前缀
topic.prefix=test-dm-jc-
5. 上传达梦driver,启动connect-standalone
将达梦的驱动放入/…/confluent-6.0.0/share/confluent-hub-components/confluentinc-kafka-connect-jdbc/lib:
进入confluent的bin目录,启动connect-standalone:
[root@spark-03 bin]# ./connect-standalone /.../confluent-6.0.0/etc/schema-registry/connect-avro-standalone.properties /.../confluent-6.0.0/share/confluent-hub-components/confluentinc-kafka-connect-jdbc/etc/source-dm8.properties
如果配置正确,则会出现下面这个提示:
[2020-12-10 01:42:20,071] INFO Using JDBC dialect Generic (io.confluent.connect.jdbc.source.JdbcSourceTask:102)
[2020-12-10 01:42:20,380] INFO Attempting to open connection #1 to Generic (io.confluent.connect.jdbc.util.CachedConnectionProvider:82)
[2020-12-10 01:42:20,858] INFO Started JDBC source task (io.confluent.connect.jdbc.source.JdbcSourceTask:261)
[2020-12-10 01:42:20,858] INFO WorkerSourceTask{id=test-source-dm-jdbc-0} Source task finished initialization and start (org.apache.kafka.connect.runtime.WorkerSourceTask:233)
[2020-12-10 01:42:20,939] INFO Begin using SQL query: SELECT * FROM "JC"."kmeans2" WHERE "JC"."kmeans2"."id" > ? ORDER BY "JC"."kmeans2"."id" ASC (io.confluent.connect.jdbc.source.TableQuerier:164)
出现了自动生成的SQL语句,表示配置成功。