kafka 多节点消费_如何通过多节点多代理架构在命令行界面上将Kafka Connect与Mysql Server集成

kafka 多节点消费

Before we start our progress one must look at the installation of Kafka into the system. Similar to the installation of Kafka blog we will be using Ubuntu 18.04 for the execution of our steps. We will follow step wise starting from creation of Multi-Node Multi-broker architecture over Docker.

在开始我们的进度之前,必须研究一下Kafka在系统中的安装 。 与安装Kafka博客类似,我们将使用Ubuntu 18.04执行步骤。 我们将从在Docker上创建多节点多代理架构开始,逐步进行操作。

1.安装Docker (1. Installation of Docker)

First we need to install docker on our system. It is advisable to use ‘sudo’ command in every step as it provides us with administrator privileedges. One must be familiar with the password of the system admin in order to use ‘sudo’.

首先,我们需要在系统上安装docker。 建议在每个步骤中使用“ sudo”命令,因为它为我们提供了管理员权限。 为了使用“ sudo”,必须熟悉系统管理员的密码。

$ sudo apt-get update$ sudo apt-get install docker.io docker-compose

2.创建“ .yml”文件 (2. Creation of “.yml” file)

Now one can create a directory naming “Docker_Kafka”

现在可以创建一个名为“ Docker_Kafka”的目录

$ mkdir Docker_Kafka$ cd Docker_Kafka

Now we need to write down a yml file naming “docker-compose.yml” containg following code :

现在我们需要写下一个包含以下代码的yml文件,命名为“ docker-compose.yml”:

version: ‘2’services:zookeeper-1:image: confluentinc/cp-zookeeper:latesthostname: zookeeper-1ports:- “12181:12181”environment:ZOOKEEPER_SERVER_ID: 1ZOOKEEPER_CLIENT_PORT: 12181ZOOKEEPER_TICK_TIME: 2000ZOOKEEPER_INIT_LIMIT: 5ZOOKEEPER_SYNC_LIMIT: 2ZOOKEEPER_SERVERS:Zookeeper1:12888:13888;zookeeper2:22888:23888;zookeeper-3:32888:33888zookeeper-2:image: confluentinc/cp-zookeeper:latesthostname: zookeeper-2ports:- “22181:22181”environment:ZOOKEEPER_SERVER_ID: 2ZOOKEEPER_CLIENT_PORT: 22181ZOOKEEPER_TICK_TIME: 2000ZOOKEEPER_INIT_LIMIT: 5ZOOKEEPER_SYNC_LIMIT: 2ZOOKEEPER_SERVERS:Zookeeper1:12888:13888;zookeeper2:22888:23888;zookeeper-3:32888:33888zookeeper-3:image: confluentinc/cp-zookeeper:latesthostname: zookeeper-3ports:- “32181:32181”environment:ZOOKEEPER_SERVER_ID: 3ZOOKEEPER_CLIENT_PORT: 32181ZOOKEEPER_TICK_TIME: 2000ZOOKEEPER_INIT_LIMIT: 5ZOOKEEPER_SYNC_LIMIT: 2ZOOKEEPER_SERVERS:Zookeeper1:12888:13888;zookeeper2:22888:23888;zookeeper-3:32888:33888kafka-1:image: confluentinc/cp-kafka:latesthostname: kafka-1ports:- “19092:19092”depends_on:- zookeeper-1- zookeeper-2- zookeeper-3environment:KAFKA_BROKER_ID: 1KAFKA_ZOOKEEPER_CONNECT: zookeeper-1:12181,zookeeper-2:12181,zookeeper-3:12181KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-1:19092kafka-2:image: confluentinc/cp-kafka:latesthostname: kafka-2ports:- “29092:29092”depends_on:- zookeeper-1- zookeeper-2- zookeeper-3environment:KAFKA_BROKER_ID: 2KAFKA_ZOOKEEPER_CONNECT: zookeeper-1:12181,zookeeper-2:12181,zookeeper-3:12181KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-2:29092kafka-3:image: confluentinc/cp-kafka:latesthostname: kafka-3ports:- “39092:39092”depends_on:- zookeeper-1-zookeeper-2- zookeeper-3environment:KAFKA_BROKER_ID: 3KAFKA_ZOOKEEPER_CONNECT: zookeeper-1:12181,zookeeper-2:12181,zookeeper-3:12181KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-3:39092

Basically we are creating three zookeeper nodes on port number 12181, 22181 and 32181. Over it we are creating three kafka broker on port numbers 19092, 29092 and 39092. On execution of the above .yml file as shown below Kafka Nodes will select one Leader Broker among the provided brokers. Other brokers will act as the followers. Whenever a producer-consumer process takes place, it is the work of the leader where to store and how much crunch of data needed to be stored at each location. Based on the replication factor and number of partition provided by the developer the leader acts acordingly. If for some reason the leader broker crashes, one of the follower broker will be elected as the leader and takes over the responsibility from where the previous leader fails. We need to add kafka-1, kafka-2 and kafka-3 broker hosts to the client /etc/hosts file. Use the following command:

基本上,我们在端口号12181、22181和32181上创建了三个Zookeeper节点。在它上面,我们在端口号19092、29092和39092上创建了三个kafka代理。如下所示,在执行上述.yml文件时,Kafka节点将选择一个Leader提供的经纪人中的经纪人。 其他经纪人将充当追随者。 无论何时进行生产者-消费者过程,都是领导者的工作,在哪里存储以及在每个位置需要存储多少数据。 根据复制因子和开发人员提供的分区数量,领导者会相应地采取行动。 如果领导经纪人由于某种原因崩溃,则将选出一名跟随经纪人作为领导,并接任前任领导失败者的责任。 我们需要将kafka-1,kafka-2和kafka-3代理主机添加到客户端/ etc / hosts文件中。 使用以下命令:

$ sudo gedit /etc/hosts

The host file look something like shown below. Bold lines are the modification one have to make in the file. First few lines are shown below:

主机文件如下所示。 粗线是必须在文件中进行的修改。 前几行如下所示:

127.0.0.1 localhost192.168.1.231 kafka-1 kafka-2 kafka-3 (if running on different machine)or127.0.0.0.1 kafka-1 kafka-2 kafka-3 (if running on same machine)# The following lines are desirables for Ipv6 capable hosts::1 ip6-localhost ip6-loopback

Now open a terminal having home path as the path of the directory “Docker_Kafka” and run:

现在打开一个终端,该终端具有home路径作为目录“ Docker_Kafka”的路径,并运行:

$ sudo docker-compose up

It will take some time to load the data. Make sure that internet is on as it might need to download some of the files.

加载数据需要一些时间。 确保互联网已打开,因为可能需要下载某些文件。

3.创建一个数据库作为数据源 (3. Create a Database to act as a DataSource)

Let’s create a table employee for the purpose of making our own Data Source. We will make it as simple as possible. After logining into the MySql server, let create a database name ‘exp’.

为了创建自己的数据源,让我们创建一个表雇员。 我们将使其尽可能简单。 登录到MySql服务器后,创建一个数据库名称'exp'。

> CREATE DATABASE exp

Now create a table inside the ‘exp’ database. For learning purpose lets make the schema of the table as simple as possible. We need to make a column as AUTO_INCREMENT for any change in the database, Kafka Connector will publish data from the datasource to the Kafka Consumers observing this column only. In our example, let us make ‘eid’ column as both primary key and auto_increment column.

现在在“ exp”数据库​​中创建一个表。 出于学习目的,让表的架构尽可能简单。 对于数据库中的任何更改,我们都需要将一列设置为AUTO_INCREMENT,Kafka Connector会将数据源中的数据发布到仅观察此列的Kafka Consumers。 在我们的示例中,让我们将“ eid”列设为主键和auto_increment列。

> CREATE TABLE employee(
eid int(11) NOT NULL AUTO_INCREMENT,
ename varchar(20) DEFAULT NULL,
esal int(8) DEFAULT NULL,
edep varchar(20) DEFAULT NULL,
PRIMARY KEY (`eid`)
)

4.创建用于连接到数据源的.properties文件 (4. Creation of .properties files for the connection to Data Source)

In order to setup the architecture one has to first set up the Multi-broker achitecture using Docker-compose in second step (2). Apart from this we need to set up two more properties files. The first property file is the “mysql.properties” and second is “woker.properties”. The naming of this property files need not to be the same. The ‘worker.properties’ files contains the configuration set up and constraints that are required by the Kafka connector in order to understand the architecturre over which it is running and over which broker it needs to publish the data and its format. The other ‘mysql.properties files contains various information and variable over the data source used to fetch the data.

为了设置架构,必须首先在第二步(2)中使用Docker-compose设置Multi-broker架构。 除此之外,我们还需要设置两个属性文件。 第一个属性文件是“ mysql.properties”,第二个是“ woker.properties”。 此属性文件的命名不必相同。 “ worker.properties”文件包含Kafka连接器所需的配置设置和约束,以了解其运行所在的架构以及需要通过哪个代理发布数据及其格式的架构。 其他的“ mysql.properties”文件包含各种信息和用于获取数据的数据源上的变量。

Worker.properties:

Worker.properties:

common-worker-configs
bootstrap.servers=127.0.0.1:29092, 127.0.0.1:39092, 127.0.0.1:19092
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter=org.apache.kafka.connect.json.JsonConverterinternal.value.converter.schemas.enable=false
# this config is only for standalone workers
offset.storage.file.filename=standalone.offsets
offset.flush.interval.ms=10000

In worker.properrties files, we set various aspect of variables required by the connector before connecting to the broker. Such as ‘bootstrap.servers’ store the port number and the ip address of all the brokers running on the architecture. The data will be published in the key-value pair. Thus, we can mention in which format we want our data to consume. Mostly, the systems uses JSON format, apart from it we can even use AVRO format for the consumption of the data. The ‘standalone.offsets’ file contains all the data and their offset value to which a consumer has consumed the data.

在worker.properrties文件中,我们设置连接器在连接到代理之前所需的变量的各个方面。 例如'bootstrap.servers',存储在体系结构上运行的所有代理的端口号和ip地址。 数据将在键值对中发布。 因此,我们可以提及我们要使用哪种格式的数据。 通常,系统使用JSON格式,除此之外,我们甚至可以使用AVRO格式来处理数据。 “ standalone.offsets”文件包含所有数据以及使用者已使用数据的偏移量值。

mysql.properties

mysql.properties

name=test-source-mysql-jdbc-autoincrement
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:mysql://127.0.0.1:3306/bank?useSSL=false
connection.password=xander21
connection.user=XANDER
mode=incrementing
incrementing.column.name=eid
topic.prefix=Topic123-

Name of the user, database and password are the adjustment one has to make before connecting to the database. Username and password may vary from person to person. Incrementing coloumn name is the column that is strictly incrementing column in the table of one database to use to detect new rows. This column must not be nullable. If one does not have a column with these properties, one have to update one of the column with following SQL command:

用户名,数据库和密码是连接到数据库之前必须进行的调整。 用户名和密码可能因人而异。 列名递增是严格将一个数据库的表中的列递增的列,用于检测新行。 该列不能为空。 如果没有具有这些属性的列,则必须使用以下SQL命令更新该列之一:

ALTER TABLE <table_name> MODIFY COLUMN <coloumn_name> INT auto_incrementing ALTER TABLE <table_name> ADD PRIMARY KEY (<coloumn_name>)

topic.prefix: Prefix to append to table names to generate the name of the Kafka topic to publish data to, or in the case of a custom query, the full name of the topic to publsih to. Example: If topic.prefix=test- and if the name of the table is employee, then the topic name to which connector publishes the messages would be test-employee.

topic.prefix:附加到表名的前缀,以生成要向其发布数据的Kafka主题的名称,或者在自定义查询的情况下,生成要发布到的主题的全名。 示例:如果topic.prefix = test-且表的名称为employee ,则连接器向其发布消息的主题名称将为test-employee。

5.下载MySQL连接器 (5. Download MySQL Connector)

Now we have to download MySQL Connector for Java. It is required by the connector in order to connect to the MySQL database. We con download the .jar files from here. Now we need to copy the MySQL Connector jar files to existing Kafka Connect JDBC Jars. In Ubuntu system we can find it in /usr/share/java/kafka-connect-jdbc.

现在我们必须下载MySQL Connector for Java。 连接器需要它才能连接到MySQL数据库。 我们从此处下载.jar文件。 现在,我们需要将MySQL连接器jar文件复制到现有的Kafka Connect JDBC Jars中。 在Ubuntu系统中,我们可以在/ usr / share / java / kafka-connect-jdbc中找到它。

6.在Kafka Broker中生成一个主题 (6. Generate a topic inside Kafka Broker)

Lets generate a topic using Kafka Topic API under any one of the Kafka Broker with multiple replication factor and partitions. If there are n number of broker and n number of replicas then no two replicas will be stored under same broker. If the leader broker fails to provide support, then one of the follower broker with its own replica of data becomes the next leader and the respective replica is called primary replica and other replica becomes the secondary replicas. The secondary replicas update themselves with respect to the primary replica. This leader-follower architecture is what makes the Kafka fail-safe protected and robust in nature.

让我们使用具有多个复制因子和分区的任意一个Kafka Broker下的Kafka Topic API来生成主题。 如果存在n个代理和n个副本,则不会在同一代理下存储两个副本。 如果领导者经纪人无法提供支持,则具有其自己的数据副本的跟随者经纪人之一将成为下一个领导者,并且相应的副本称为主副本,其他副本则成为辅助副本。 辅助副本会相对于主副本进行更新。 这种领先的跟随者架构使Kafka具有自动防故障保护功能并具有强大的本质。

We are using the name of the topic as Test123-employee because ‘Test123-’ is the mentioned prefix by the us in one of the properties files. Thus which data from which datasource connector will be determined by the group of topics having the same prefix. Under same group of topics if certain consumer wants to display data only from certain table then for that reason ‘employee’ has been added in the postfix. The postfix of the topic name determines from which particular table the consumer is consuming the data or in other words kafka broker understands to which consumer process it must sends the data while subscribing to a particular topic by the respective consumer.

我们将主题名称用作Test123-employee,因为'Test123-'是我们在其中一个属性文件中提到的前缀。 因此,来自哪个数据源连接器的数据将由具有相同前缀的主题组确定。 在同一主题下,如果某些消费者只想显示某些表中的数据,则在后缀中添加“雇员”。 主题名称的后缀确定了消费者从哪个特定表中消费数据,或者换句话说,kafka经纪人了解到在由各个消费者订阅特定主题时,它必须将数据发送给哪个消费者进程。

$usr/bin>kafka-topics --create --bootstrap-server localhost:39092 --replication-factor 3 --partitions 3 --topic Test123-employee

7.执行Kafka Connect API (7. Execution of the Kafka Connect API)

Execution begins from this part. Start the zookeeper and kafka broker list using docker. Set the path of the docker compose file into the terminal first and then run the command below. Let both the properties and the docker-compose file be in a folder name kafka so the path is

执行从这部分开始。 使用docker启动zookeeper和kafka代理列表。 首先将docker compose文件的路径设置到终端中,然后运行以下命令。 让属性和docker-compose文件都位于文件夹名称kafka中,因此路径为

$cd /kafka$kafka> sudo docker-compose up

Then in another terminal connect the connector i.e, standalone-connector file with properties file. Make the /usr/bin file in the path

然后在另一个终端中连接连接器,即独立连接器文件和属性文件。 在路径中创建/ usr / bin文件

$ cd /usr/bin$usr/bin> sudo standalone-connect /kafka/worker.properties /kafka/worker.properties

The above command will take some to work. After the connection is established start a consumer process under any one of the zookeeper server with the topic “Topic123-employee” and then try to insert the data in employee table as stated in the mysql properties file under “exp” database name.

上面的命令将需要一些工作。 建立连接后,在任何一家Zookeeper服务器下以“ Topic123-employee”为主题启动使用者进程,然后尝试按照mysql属性文件中“ exp”数据库​​名称下的说明,将数据插入employee表中。

$usr/bin> kafka-consumer --broker-list 127.0.0.1:19092 --topic Topic123-employee --from-beginning

Now insert the data into the database and observe the consumer process on the terminal. In order to insert the data into the database we can write:

现在将数据插入数据库,并观察终端上的使用者进程。 为了将数据插入数据库,我们可以编写:

> INSERT INTO employee( ename, esal, edep) values (‘Barney’, 32596, “Big Data Developer”)

We haven’t inserted into the eid or ‘employee-id’ column as the value of the column will be auto inserted with each new insertion in an incrementing manner. The above mentioned process has been displayed in the video below.

我们尚未插入eid或'employee-id'列,因为该列的值将随着每次新插入而以递增方式自动插入。 上述过程已显示在下面的视频中。

Intergration of Kafka Connector with MySQL Database
Kafka Connector与MySQL数据库的集成

Hopefully the above blog helps you and keep learning data architecture.

希望以上博客对您有所帮助,并继续学习数据架构。

翻译自: https://medium.com/swlh/how-to-intergrate-kafka-connect-with-mysql-server-on-command-line-interface-over-multi-node-f9630a7b3b72

kafka 多节点消费

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值