既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!
由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新
start slave;
show slave status\G
select user,host from mysql.user;
select * from test.t1;
输出:
mysql> change master to
-> master_host=‘172.18.16.156’,
-> master_port=3306,
-> master_user=‘repl’,
-> master_password=‘123456’,
-> master_log_file=‘mysql-bin.000001’,
-> master_log_pos=977;
Query OK, 0 rows affected, 2 warnings (0.00 sec)
mysql> start slave;
Query OK, 0 rows affected (0.01 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.18.16.156
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 2431
Relay_Log_File: vvgg-z2-music-mysqld-relay-bin.000002
Relay_Log_Pos: 1776
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 2431
Relay_Log_Space: 1999
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1563306
Master_UUID: ba615057-e11c-11ee-b80e-246e961c91f8
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: ba615057-e11c-11ee-b80e-246e961c91f8:4-8
Executed_Gtid_Set: ba615057-e11c-11ee-b80e-246e961c91f8:4-8,
c2df1946-e11c-11ee-8026-246e961c91f8:1-3
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set (0.00 sec)
mysql> select user,host from mysql.user;
±-----------------±----------+
| user | host |
±-----------------±----------+
| dba | % |
| repl | % |
| mysql.infoschema | localhost |
| mysql.session | localhost |
| mysql.sys | localhost |
| root | localhost |
±-----------------±----------+
6 rows in set (0.00 sec)
mysql> select * from test.t1;
±—±-----------------±--------------------+
| id | remark | createtime |
±—±-----------------±--------------------+
| 1 | 第一行:row1 | 2024-03-20 10:25:32 |
| 2 | 第二行:row2 | 2024-03-20 10:25:32 |
| 3 | 第三行:row3 | 2024-03-20 10:25:32 |
±—±-----------------±--------------------+
3 rows in set (0.00 sec)
MySQL主从复制相关配置参见“[配置异步复制]( )”。
## 四、安装部署 Kafka Connector
在 node2 上执行以下步骤。
### 1. 创建插件目录
mkdir $KAFKA_HOME/plugins
### 2. 解压文件到插件目录
debezium-connector-mysql
unzip debezium-debezium-connector-mysql-2.4.2.zip -d $KAFKA_HOME/plugins/
kafka-connect-hbase
unzip confluentinc-kafka-connect-hbase-2.0.13.zip -d $KAFKA_HOME/plugins/
### 3. 配置 Kafka Connector
#### (1)配置属性文件
编辑 connect-distributed.properties 文件
vim $KAFKA_HOME/config/connect-distributed.properties
内容如下:
bootstrap.servers=node2:9092,node3:9092,node4:9092
group.id=connect-cluster
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.topic=connect-offsets
offset.storage.replication.factor=3
offset.storage.partitions=3
config.storage.topic=connect-configs
config.storage.replication.factor=3
status.storage.topic=connect-status
status.storage.replication.factor=3
status.storage.partitions=3
offset.flush.interval.ms=10000
plugin.path=/root/kafka_2.13-3.7.0/plugins
#### (2)分发到其它节点
scp
K
A
F
K
A
H
O
M
E
/
c
o
n
f
i
g
/
c
o
n
n
e
c
t
−
d
i
s
t
r
i
b
u
t
e
d
.
p
r
o
p
e
r
t
i
e
s
n
o
d
e
3
:
KAFKA_HOME/config/connect-distributed.properties node3:
KAFKAHOME/config/connect−distributed.propertiesnode3:KAFKA_HOME/config/
scp
K
A
F
K
A
H
O
M
E
/
c
o
n
f
i
g
/
c
o
n
n
e
c
t
−
d
i
s
t
r
i
b
u
t
e
d
.
p
r
o
p
e
r
t
i
e
s
n
o
d
e
4
:
KAFKA_HOME/config/connect-distributed.properties node4:
KAFKAHOME/config/connect−distributed.propertiesnode4:KAFKA_HOME/config/
scp -r
K
A
F
K
A
H
O
M
E
/
p
l
u
g
i
n
s
n
o
d
e
3
:
KAFKA_HOME/plugins node3:
KAFKAHOME/pluginsnode3:KAFKA_HOME/
scp -r
K
A
F
K
A
H
O
M
E
/
p
l
u
g
i
n
s
n
o
d
e
4
:
KAFKA_HOME/plugins node4:
KAFKAHOME/pluginsnode4:KAFKA_HOME/
#### (3)以 distributed 方式启动
三台都执行,在三个节点上各启动一个 worker 进程,用以容错和负载均衡。
connect-distributed.sh -daemon $KAFKA_HOME/config/connect-distributed.properties
确认日志是否有 ERROR
grep ERROR ~/kafka_2.13-3.7.0/logs/connectDistributed.out
#### (4)确认 connector 插件和自动生成的 topic
查看连接器插件:
curl -X GET http://node2:8083/connector-plugins | jq
从输出中可以看到,Kafka connect 已经识别到了 hbase sink 和 mysql source 插件:
[root@vvml-yz-hbase-test~]#curl -X GET http://node2:8083/connector-plugins | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 494 100 494 0 0 4111 0 --:–:-- --:–:-- --:–:-- 4116
[
{
“class”: “io.confluent.connect.hbase.HBaseSinkConnector”,
“type”: “sink”,
“version”: “2.0.13”
},
{
“class”: “io.debezium.connector.mysql.MySqlConnector”,
“type”: “source”,
“version”: “2.4.2.Final”
},
{
“class”: “org.apache.kafka.connect.mirror.MirrorCheckpointConnector”,
“type”: “source”,
“version”: “3.7.0”
},
{
“class”: “org.apache.kafka.connect.mirror.MirrorHeartbeatConnector”,
“type”: “source”,
“version”: “3.7.0”
},
{
“class”: “org.apache.kafka.connect.mirror.MirrorSourceConnector”,
“type”: “source”,
“version”: “3.7.0”
}
]
[root@vvml-yz-hbase-test~]#
查看 topic:
kafka-topics.sh --list --bootstrap-server node2:9092,node3:9092,node4:9092
从输出中可以看到,Kafka connect 启动时自动创建了 connect-configs、connect-offsets、connect-status 三个 topic:
[root@vvml-yz-hbase-test~]#kafka-topics.sh --list --bootstrap-server node2:9092,node3:9092,node4:9092
__consumer_offsets
connect-configs
connect-offsets
connect-status
test-1
test-3
[root@vvml-yz-hbase-test~]#
### 4. 创建 source connector
#### (1)创建源 mysql 配置文件
编辑文件
vim $KAFKA_HOME/plugins/source-mysql.json
内容如下:
{
“name”: “mysql-source-connector”,
“config”: {
“connector.class”: “io.debezium.connector.mysql.MySqlConnector”,
“tasks.max”: “1”,
“topic.prefix”: “mysql-hbase-test”,
“database.hostname”: “172.18.16.156”,
“database.port”: “3307”,
“database.user”: “dba”,
“database.password”: “123456”,
“database.server.id”: “1563307”,
“database.server.name”: “dbserver1”,
“database.include.list”: “test”,
“schema.history.internal.kafka.bootstrap.servers”: “node2:9092,node3:9092,node4:9092”,
“schema.history.internal.kafka.topic”: “schemahistory.mysql-hbase-test”
}
}
#### (2)创建 mysql source connector
创建 connector
curl -X POST -H ‘Content-Type: application/json’ -i ‘http://node2:8083/connectors’ -d @“/root/kafka_2.13-3.7.0/plugins/source-mysql.json”
查看 connector 状态
curl -X GET http://node2:8083/connectors/mysql-source-connector/status | jq
查看 topic
kafka-topics.sh --list --bootstrap-server node2:9092,node3:9092,node4:9092
从输出中可以看到,mysql-source-connector 状态为 RUNNING,并自动创建了三个 topic:
[root@vvml-yz-hbase-test~]#curl -X POST -H ‘Content-Type: application/json’ -i ‘http://node2:8083/connectors’ -d @“/root/kafka_2.13-3.7.0/plugins/source-mysql.json”
HTTP/1.1 201 Created
Date: Wed, 20 Mar 2024 02:31:30 GMT
Location: http://node2:8083/connectors/mysql-source-connector
Content-Type: application/json
Content-Length: 579
Server: Jetty(9.4.53.v20231009)
{“name”:“mysql-source-connector”,“config”:{“connector.class”:“io.debezium.connector.mysql.MySqlConnector”,“tasks.max”:“1”,“topic.prefix”:“mysql-hbase-test”,“database.hostname”:“172.18.16.156”,“database.port”:“3307”,“database.user”:“dba”,“database.password”:“123456”,“database.server.id”:“1563307”,“database.server.name”:“dbserver1”,“database.include.list”:“test”,“schema.history.internal.kafka.bootstrap.servers”:“node2:9092,node3:9092,node4:9092”,“schema.history.internal.kafka.topic”:“schemahistory.mysql-hbase-test”,“name”:“mysql-source-connector”},“tasks”:[],“type”:“source”}
[root@vvml-yz-hbase-test~]#curl -X GET http://node2:8083/connectors/mysql-source-connector/status | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 182 100 182 0 0 20726 0 --:–:-- --:–:-- --:–:-- 22750
{
“name”: “mysql-source-connector”,
“connector”: {
“state”: “RUNNING”,
“worker_id”: “172.18.4.188:8083”
},
“tasks”: [
{
“id”: 0,
“state”: “RUNNING”,
“worker_id”: “172.18.4.188:8083”
}
],
“type”: “source”
}
[root@vvml-yz-hbase-test~]#kafka-topics.sh --list --bootstrap-server node2:9092,node3:9092,node4:9092
__consumer_offsets
connect-configs
connect-offsets
connect-status
mysql-hbase-test
mysql-hbase-test.test.t1
schemahistory.mysql-hbase-test
test-1
test-3
[root@vvml-yz-hbase-test~]#
### 5. 创建 sink connector
#### (1)创建目标 hbase 配置文件
编辑文件
vim $KAFKA_HOME/plugins/sink-hbase.json
内容如下:
{
“name”: “hbase-sink-connector”,
“config”: {
“topics”: “mysql-hbase-test.test.t1”,
“tasks.max”: “1”,
“connector.class”: “io.confluent.connect.hbase.HBaseSinkConnector”,
“key.converter”:“org.apache.kafka.connect.storage.StringConverter”,
“value.converter”:“org.apache.kafka.connect.storage.StringConverter”,
“confluent.topic.bootstrap.servers”: “node2:9092,node3:9092,node4:9092”,
“confluent.topic.replication.factor”:3,
“hbase.zookeeper.quorum”: “node2,node3,node4”,
“hbase.zookeeper.property.clientPort”: “2181”,
“auto.create.tables”: “true”,
“auto.create.column.families”: “true”,
“table.name.format”: “example_table”
}
}
#### (2)创建 hbase sink connector
创建 connector
curl -X POST -H ‘Content-Type: application/json’ -i ‘http://node2:8083/connectors’ -d @“/root/kafka_2.13-3.7.0/plugins/sink-hbase.json”
查看 connector 状态
curl -X GET http://node2:8083/connectors/hbase-sink-connector/status | jq
查看 consumer group
kafka-consumer-groups.sh --list --bootstrap-server node2:9092,node3:9092,node4:9092
从输出中可以看到,hbase-sink-connector 状态为 RUNNING,并自动创建了一个消费者组:
[root@vvml-yz-hbase-test~]#curl -X POST -H ‘Content-Type: application/json’ -i ‘http://node2:8083/connectors’ -d @“/root/kafka_2.13-3.7.0/plugins/sink-hbase.json”
HTTP/1.1 201 Created
Date: Wed, 20 Mar 2024 02:33:11 GMT
Location: http://node2:8083/connectors/hbase-sink-connector
Content-Type: application/json
Content-Length: 654
Server: Jetty(9.4.53.v20231009)
{“name”:“hbase-sink-connector”,“config”:{“topics”:“mysql-hbase-test.test.t1”,“tasks.max”:“1”,“connector.class”:“io.confluent.connect.hbase.HBaseSinkConnector”,“key.converter”:“org.apache.kafka.connect.storage.StringConverter”,“value.converter”:“org.apache.kafka.connect.storage.StringConverter”,“confluent.topic.bootstrap.servers”:“node2:9092,node3:9092,node4:9092”,“confluent.topic.replication.factor”:“3”,“hbase.zookeeper.quorum”:“node2,node3,node4”,“hbase.zookeeper.property.clientPort”:“2181”,“auto.create.tables”:“true”,“auto.create.column.families”:“true”,“table.name.format”:“example_table”,“name”:“hbase-sink-connector”},“tasks”:[],“type”:“sink”}
[root@vvml-yz-hbase-test~]#curl -X GET http://node2:8083/connectors/hbase-sink-connector/status | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 176 100 176 0 0 23084 0 --:–:-- --:–:-- --:–:-- 25142
{
“name”: “hbase-sink-connector”,
“connector”: {
“state”: “RUNNING”,
“worker_id”: “172.18.4.71:8083”
},
“tasks”: [
{
“id”: 0,
“state”: “RUNNING”,
“worker_id”: “172.18.4.71:8083”
}
],
“type”: “sink”
}
[root@vvml-yz-hbase-test~]#kafka-consumer-groups.sh --list --bootstrap-server node2:9092,node3:9092,node4:9092
connect-hbase-sink-connector
[root@vvml-yz-hbase-test~]#
### 6. 存量数据自动同步
sink connector 自动在 hbase 中创建了 example\_table 表,并且自动同步了前面配置 MySQL 主从复制时添加的三条测试数据:
[root@vvml-yz-hbase-test~]#hbase shell
HBase Shell
Use “help” to get list of supported commands.
Use “exit” to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.5.7-hadoop3, r6788f98356dd70b4a7ff766ea7a8298e022e7b95, Thu Dec 14 16:16:10 PST 2023
Took 0.0012 seconds
hbase:001:0> list
TABLE
SYSTEM:CATALOG
SYSTEM:CHILD_LINK
SYSTEM:FUNCTION
SYSTEM:LOG
SYSTEM:MUTEX
SYSTEM:SEQUENCE
SYSTEM:STATS
SYSTEM:TASK
example_table
test
10 row(s)
Took 0.3686 seconds
=> [“SYSTEM:CATALOG”, “SYSTEM:CHILD_LINK”, “SYSTEM:FUNCTION”, “SYSTEM:LOG”, “SYSTEM:MUTEX”, “SYSTEM:SEQUENCE”, “SYSTEM:STATS”, “SYSTEM:TASK”, “example_table”, “test”]
hbase:002:0> describe ‘example_table’
Table example_table is ENABLED
example_table, {TABLE_ATTRIBUTES => {METADATA => {‘hbase.store.file-tracker.impl’ => ‘DEFAULT’}}}
COLUMN FAMILIES DESCRIPTION
{NAME => ‘mysql-hbase-test.test.t1’, INDEX_BLOCK_ENCODING => ‘NONE’, VERSIONS => ‘1’, KEEP_DELETED_CELLS => ‘FALSE’, DATA_BLOCK_ENCODING =
‘NONE’, TTL => ‘FOREVER’, MIN_VERSIONS => ‘0’, REPLICATION_SCOPE => ‘0’, BLOOMFILTER => ‘ROW’, IN_MEMORY => ‘false’, COMPRESSION => ‘NON
E’, BLOCKCACHE => ‘true’, BLOCKSIZE => ‘65536 B (64KB)’}
1 row(s)
Quota is disabled
Took 0.1173 seconds
hbase:003:0> scan ‘example_table’,{FORMATTER=>‘toString’}
ROW COLUMN+CELL
{“id”:1} column=mysql-hbase-test.test.t1:KAFKA_VALUE, timestamp=2024-03-20T10:33:13.587, value={“before”:null,"
after":{“id”:1,“remark”:“第一行:row1”,“createtime”:“2024-03-20T02:25:32Z”},“source”:{“version”:“2.4.2.Fin
al”,“connector”:“mysql”,“name”:“mysql-hbase-test”,“ts_ms”:1710901892000,“snapshot”:“first”,“db”:“test”
,“sequence”:null,“table”:“t1”,“server_id”:0,“gtid”:null,“file”:“mysql-bin.000001”,“pos”:2465,“row”:0,"
thread":null,“query”:null},“op”:“r”,“ts_ms”:1710901892115,“transaction”:null}
{“id”:2} column=mysql-hbase-test.test.t1:KAFKA_VALUE, timestamp=2024-03-20T10:33:13.593, value={“before”:null,"
after":{“id”:2,“remark”:“第二行:row2”,“createtime”:“2024-03-20T02:25:32Z”},“source”:{“version”:“2.4.2.Fin
al”,“connector”:“mysql”,“name”:“mysql-hbase-test”,“ts_ms”:1710901892000,“snapshot”:“true”,“db”:“test”,
“sequence”:null,“table”:“t1”,“server_id”:0,“gtid”:null,“file”:“mysql-bin.000001”,“pos”:2465,“row”:0,“t
hread”:null,“query”:null},“op”:“r”,“ts_ms”:1710901892117,“transaction”:null}
{“id”:3} column=mysql-hbase-test.test.t1:KAFKA_VALUE, timestamp=2024-03-20T10:33:13.596, value={“before”:null,"
after":{“id”:3,“remark”:“第三行:row3”,“createtime”:“2024-03-20T02:25:32Z”},“source”:{“version”:“2.4.2.Fin
al”,“connector”:“mysql”,“name”:“mysql-hbase-test”,“ts_ms”:1710901892000,“snapshot”:“last”,“db”:“test”,
“sequence”:null,“table”:“t1”,“server_id”:0,“gtid”:null,“file”:“mysql-bin.000001”,“pos”:2465,“row”:0,“t
hread”:null,“query”:null},“op”:“r”,“ts_ms”:1710901892117,“transaction”:null}
3 row(s)
Took 0.0702 seconds
hbase:004:0>
debezium-connector-mysql 默认会在启动时将存量数据写到 Kafka 中,这使得在构建实时数仓时,可以做到存量数据与增量数据一步实时同步,极大方便了 CDC(Change Data Capture,变化数据捕获) 过程。
### 7. 实时数据同步测试
MySQL 主库数据变更:
insert into test.t1 (remark) values (‘第四行:row4’);
update test.t1 set remark = ‘第五行:row5’ where id = 4;
delete from test.t1 where id =1;
Hbase 查看数据变化:
hbase:004:0> scan ‘example_table’,{FORMATTER=>‘toString’}
ROW COLUMN+CELL
{“id”:1} column=mysql-hbase-test.test.t1:KAFKA_VALUE, timestamp=2024-03-20T10:33:13.587, value={“before”:null,"
after":{“id”:1,“remark”:“第一行:row1”,“createtime”:“2024-03-20T02:25:32Z”},“source”:{“version”:“2.4.2.Fin
al”,“connector”:“mysql”,“name”:“mysql-hbase-test”,“ts_ms”:1710901892000,“snapshot”:“first”,“db”:“test”
,“sequence”:null,“table”:“t1”,“server_id”:0,“gtid”:null,“file”:“mysql-bin.000001”,“pos”:2465,“row”:0,"
thread":null,“query”:null},“op”:“r”,“ts_ms”:1710901892115,“transaction”:null}
{“id”:2} column=mysql-hbase-test.test.t1:KAFKA_VALUE, timestamp=2024-03-20T10:33:13.593, value={“before”:null,"
after":{“id”:2,“remark”:“第二行:row2”,“createtime”:“2024-03-20T02:25:32Z”},“source”:{“version”:“2.4.2.Fin
al”,“connector”:“mysql”,“name”:“mysql-hbase-test”,“ts_ms”:1710901892000,“snapshot”:“true”,“db”:“test”,
“sequence”:null,“table”:“t1”,“server_id”:0,“gtid”:null,“file”:“mysql-bin.000001”,“pos”:2465,“row”:0,“t
hread”:null,“query”:null},“op”:“r”,“ts_ms”:1710901892117,“transaction”:null}
{“id”:3} column=mysql-hbase-test.test.t1:KAFKA_VALUE, timestamp=2024-03-20T10:33:13.596, value={“before”:null,"
after":{“id”:3,“remark”:“第三行:row3”,“createtime”:“2024-03-20T02:25:32Z”},“source”:{“version”:“2.4.2.Fin
al”,“connector”:“mysql”,“name”:“mysql-hbase-test”,“ts_ms”:1710901892000,“snapshot”:“last”,“db”:“test”,
“sequence”:null,“table”:“t1”,“server_id”:0,“gtid”:null,“file”:“mysql-bin.000001”,“pos”:2465,“row”:0,“t
hread”:null,“query”:null},“op”:“r”,“ts_ms”:1710901892117,“transaction”:null}
{“id”:4} column=mysql-hbase-test.test.t1:KAFKA_VALUE, timestamp=2024-03-20T10:38:18.788, value={“before”:null,"
after":{“id”:4,“remark”:“第四行:row4”,“createtime”:“2024-03-20T02:38:18Z”},“source”:{“version”:“2.4.2.Fin
al”,“connector”:“mysql”,“name”:“mysql-hbase-test”,“ts_ms”:1710902298000,“snapshot”:“false”,“db”:“test”
,“sequence”:null,“table”:“t1”,“server_id”:1563306,“gtid”:“ba615057-e11c-11ee-b80e-246e961c91f8:9”,“fil
e”:“mysql-bin.000001”,“pos”:2679,“row”:0,“thread”:49,“query”:null},“op”:“c”,“ts_ms”:1710902298665,“tra
nsaction”:null}
4 row(s)
Took 0.0091 seconds
hbase:005:0>
MySQL 执行的 delete、update 操作没有同步到 Hbase。
查看消费情况:
[root@vvml-yz-hbase-test~]#kafka-consumer-groups.sh --group connect-hbase-sink-connector --describe --bootstrap-server node2:9092,node3:9092,node4:9092
Consumer group ‘connect-hbase-sink-connector’ has no active members.
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
connect-hbase-sink-connector mysql-hbase-test.test.t1 0 3 7 4 - - -
[root@vvml-yz-hbase-test~]#
数据变更都写入了 Kafka,但没有都消费。
查看 sink connector 状态:
[root@vvml-yz-hbase-test~]#curl -X GET http://node2:8083/connectors/hbase-sink-connector/status | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2168 100 2168 0 0 368k 0 --:–:-- --:–:-- --:–:-- 423k
{
“name”: “hbase-sink-connector”,
“connector”: {
“state”: “RUNNING”,
“worker_id”: “172.18.4.71:8083”
},
“tasks”: [
{
“id”: 0,
“state”: “FAILED”,
“worker_id”: “172.18.4.71:8083”,
“trace”: “org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:632)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:350)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:250)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:219)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:204)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:259)\n\tat org.apache.kafka.connect.runtime.isolation.Plugins.lambda$withClassLoaderKaTeX parse error: Undefined control sequence: \n at position 20: …ugins.java:237)\̲n̲\tat java.base/…RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\nCaused by: org.apache.kafka.connect.errors.ConnectException: Error inserting record in topic mysql-hbase-test.test.t1 with offset 4 and partition 0 to table example_table: \n\tat io.confluent.connect.bigtable.client.BigtableClient.handleWriteErrors(BigtableClient.java:515)\n\tat io.confluent.connect.bigtable.client.BigtableClient.insert(BigtableClient.java:287)\n\tat io.confluent.connect.bigtable.client.InsertWriter.writeInserts(InsertWriter.java:58)\n\tat io.confluent.connect.bigtable.client.InsertWriter.write(InsertWriter.java:48)\n\tat io.confluent.connect.bigtable.BaseBigtableSinkTask.put(BaseBigtableSinkTask.java:100)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:601)\n\t… 11 more\nCaused by: org.apache.kafka.connect.errors.ConnectException: Row with specified row key already exists.\n\t… 16 more\n”
}
],
“type”: “sink”
}
[root@vvml-yz-hbase-test~]#
查看 node3 上的日志文件 ~/kafka\_2.13-3.7.0/logs/connectDistributed.out,错误信息如下:
[2024-03-20 10:38:18,794] ERROR [hbase-sink-connector|task-0] WorkerSinkTask{id=hbase-sink-connector-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. Error: Error inserting record in topic mysql-hbase-test.test.t1 with offset 4 and partition 0 to table example_table: (org.apache.kafka.connect.runtime.WorkerSinkTask:630)
org.apache.kafka.connect.errors.ConnectException: Error inserting record in topic mysql-hbase-test.test.t1 with offset 4 and partition 0 to table example_table:
at io.confluent.connect.bigtable.client.BigtableClient.handleWriteErrors(BigtableClient.java:515)
at io.confluent.connect.bigtable.client.BigtableClient.insert(BigtableClient.java:287)
at io.confluent.connect.bigtable.client.InsertWriter.writeInserts(InsertWriter.java:58)
at io.confluent.connect.bigtable.client.InsertWriter.write(InsertWriter.java:48)
at io.confluent.connect.bigtable.BaseBigtableSinkTask.put(BaseBigtableSinkTask.java:100)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:601)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:350)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:250)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:219)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:204)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:259)
at org.apache.kafka.connect.runtime.isolation.Plugins.lambda$withClassLoader
1
(
P
l
u
g
i
n
s
.
j
a
v
a
:
237
)
a
t
j
a
v
a
.
b
a
s
e
/
j
a
v
a
.
u
t
i
l
.
c
o
n
c
u
r
r
e
n
t
.
E
x
e
c
u
t
o
r
s
1(Plugins.java:237) at java.base/java.util.concurrent.Executors
1(Plugins.java:237) atjava.base/java.util.concurrent.ExecutorsRunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor
W
o
r
k
e
r
.
r
u
n
(
T
h
r
e
a
d
P
o
o
l
E
x
e
c
u
t
o
r
.
j
a
v
a
:
628
)
a
t
j
a
v
a
.
b
a
s
e
/
j
a
v
a
.
l
a
n
g
.
T
h
r
e
a
d
.
r
u
n
(
T
h
r
e
a
d
.
j
a
v
a
:
834
)
C
a
u
s
e
d
b
y
:
o
r
g
.
a
p
a
c
h
e
.
k
a
f
k
a
.
c
o
n
n
e
c
t
.
e
r
r
o
r
s
.
C
o
n
n
e
c
t
E
x
c
e
p
t
i
o
n
:
R
o
w
w
i
t
h
s
p
e
c
i
f
i
e
d
r
o
w
k
e
y
a
l
r
e
a
d
y
e
x
i
s
t
s
.
.
.
.
16
m
o
r
e
[
2024
−
03
−
2010
:
38
:
18
,
794
]
E
R
R
O
R
[
h
b
a
s
e
−
s
i
n
k
−
c
o
n
n
e
c
t
o
r
∣
t
a
s
k
−
0
]
W
o
r
k
e
r
S
i
n
k
T
a
s
k
i
d
=
h
b
a
s
e
−
s
i
n
k
−
c
o
n
n
e
c
t
o
r
−
0
T
a
s
k
t
h
r
e
w
a
n
u
n
c
a
u
g
h
t
a
n
d
u
n
r
e
c
o
v
e
r
a
b
l
e
e
x
c
e
p
t
i
o
n
.
T
a
s
k
i
s
b
e
i
n
g
k
i
l
l
e
d
a
n
d
w
i
l
l
n
o
t
r
e
c
o
v
e
r
u
n
t
i
l
m
a
n
u
a
l
l
y
r
e
s
t
a
r
t
e
d
(
o
r
g
.
a
p
a
c
h
e
.
k
a
f
k
a
.
c
o
n
n
e
c
t
.
r
u
n
t
i
m
e
.
W
o
r
k
e
r
T
a
s
k
:
212
)
o
r
g
.
a
p
a
c
h
e
.
k
a
f
k
a
.
c
o
n
n
e
c
t
.
e
r
r
o
r
s
.
C
o
n
n
e
c
t
E
x
c
e
p
t
i
o
n
:
E
x
i
t
i
n
g
W
o
r
k
e
r
S
i
n
k
T
a
s
k
d
u
e
t
o
u
n
r
e
c
o
v
e
r
a
b
l
e
e
x
c
e
p
t
i
o
n
.
a
t
o
r
g
.
a
p
a
c
h
e
.
k
a
f
k
a
.
c
o
n
n
e
c
t
.
r
u
n
t
i
m
e
.
W
o
r
k
e
r
S
i
n
k
T
a
s
k
.
d
e
l
i
v
e
r
M
e
s
s
a
g
e
s
(
W
o
r
k
e
r
S
i
n
k
T
a
s
k
.
j
a
v
a
:
632
)
a
t
o
r
g
.
a
p
a
c
h
e
.
k
a
f
k
a
.
c
o
n
n
e
c
t
.
r
u
n
t
i
m
e
.
W
o
r
k
e
r
S
i
n
k
T
a
s
k
.
p
o
l
l
(
W
o
r
k
e
r
S
i
n
k
T
a
s
k
.
j
a
v
a
:
350
)
a
t
o
r
g
.
a
p
a
c
h
e
.
k
a
f
k
a
.
c
o
n
n
e
c
t
.
r
u
n
t
i
m
e
.
W
o
r
k
e
r
S
i
n
k
T
a
s
k
.
i
t
e
r
a
t
i
o
n
(
W
o
r
k
e
r
S
i
n
k
T
a
s
k
.
j
a
v
a
:
250
)
a
t
o
r
g
.
a
p
a
c
h
e
.
k
a
f
k
a
.
c
o
n
n
e
c
t
.
r
u
n
t
i
m
e
.
W
o
r
k
e
r
S
i
n
k
T
a
s
k
.
e
x
e
c
u
t
e
(
W
o
r
k
e
r
S
i
n
k
T
a
s
k
.
j
a
v
a
:
219
)
a
t
o
r
g
.
a
p
a
c
h
e
.
k
a
f
k
a
.
c
o
n
n
e
c
t
.
r
u
n
t
i
m
e
.
W
o
r
k
e
r
T
a
s
k
.
d
o
R
u
n
(
W
o
r
k
e
r
T
a
s
k
.
j
a
v
a
:
204
)
a
t
o
r
g
.
a
p
a
c
h
e
.
k
a
f
k
a
.
c
o
n
n
e
c
t
.
r
u
n
t
i
m
e
.
W
o
r
k
e
r
T
a
s
k
.
r
u
n
(
W
o
r
k
e
r
T
a
s
k
.
j
a
v
a
:
259
)
a
t
o
r
g
.
a
p
a
c
h
e
.
k
a
f
k
a
.
c
o
n
n
e
c
t
.
r
u
n
t
i
m
e
.
i
s
o
l
a
t
i
o
n
.
P
l
u
g
i
n
s
.
l
a
m
b
d
a
Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.apache.kafka.connect.errors.ConnectException: Row with specified row key already exists. ... 16 more [2024-03-20 10:38:18,794] ERROR [hbase-sink-connector|task-0] WorkerSinkTask{id=hbase-sink-connector-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:212) org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception. at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:632) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:350) at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:250) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:219) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:204) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:259) at org.apache.kafka.connect.runtime.isolation.Plugins.lambda
Worker.run(ThreadPoolExecutor.java:628) atjava.base/java.lang.Thread.run(Thread.java:834)Causedby:org.apache.kafka.connect.errors.ConnectException:Rowwithspecifiedrowkeyalreadyexists. ...16more[2024−03−2010:38:18,794]ERROR[hbase−sink−connector∣task−0]WorkerSinkTaskid=hbase−sink−connector−0Taskthrewanuncaughtandunrecoverableexception.Taskisbeingkilledandwillnotrecoveruntilmanuallyrestarted(org.apache.kafka.connect.runtime.WorkerTask:212)org.apache.kafka.connect.errors.ConnectException:ExitingWorkerSinkTaskduetounrecoverableexception. atorg.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:632) atorg.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:350) atorg.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:250) atorg.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:219) atorg.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:204) atorg.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:259) atorg.apache.kafka.connect.runtime.isolation.Plugins.lambdawithClassLoader
1
(
P
l
u
g
i
n
s
.
j
a
v
a
:
237
)
a
t
j
a
v
a
.
b
a
s
e
/
j
a
v
a
.
u
t
i
l
.
c
o
n
c
u
r
r
e
n
t
.
E
x
e
c
u
t
o
r
s
1(Plugins.java:237) at java.base/java.util.concurrent.Executors
1(Plugins.java:237) atjava.base/java.util.concurrent.ExecutorsRunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.ConnectException: Error inserting record in topic mysql-hbase-test.test.t1 with offset 4 and partition 0 to table example_table:
at io.confluent.connect.bigtable.client.BigtableClient.handleWriteErrors(BigtableClient.java:515)
at io.confluent.connect.bigtable.client.BigtableClient.insert(BigtableClient.java:287)
at io.confluent.connect.bigtable.client.InsertWriter.writeInserts(InsertWriter.java:58)
at io.confluent.connect.bigtable.client.InsertWriter.write(InsertWriter.java:48)
at io.confluent.connect.bigtable.BaseBigtableSinkTask.put(BaseBigtableSinkTask.java:100)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:601)
… 11 more
Caused by: org.apache.kafka.connect.errors.ConnectException: Row with specified row key already exists.
… 16 more
可以看到报错为:Row with specified row key already exists.
原因是 sink connector 将 MySQL 的 update、delete 都转化为 Hbase 数据插入,但自动识别的 rowkey 为 MySQL 表的主键,而该 rowkey 已经存在,所以插入报错了。这种同步行为需要注意。
## 参考:
![img](https://img-blog.csdnimg.cn/img_convert/e952322cd9c76c74fced532a85dcbe0e.png)
![img](https://img-blog.csdnimg.cn/img_convert/83623fadaacb1969bed194bd913f7fe9.png)
**网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。**
**[需要这份系统化资料的朋友,可以戳这里获取](https://bbs.csdn.net/topics/618545628)**
**一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!**
java:601)
... 11 more
Caused by: org.apache.kafka.connect.errors.ConnectException: Row with specified row key already exists.
... 16 more
可以看到报错为:Row with specified row key already exists.
原因是 sink connector 将 MySQL 的 update、delete 都转化为 Hbase 数据插入,但自动识别的 rowkey 为 MySQL 表的主键,而该 rowkey 已经存在,所以插入报错了。这种同步行为需要注意。
参考:
[外链图片转存中…(img-LlGqy7SR-1715735544545)]
[外链图片转存中…(img-HrM5yInD-1715735544545)]
网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。
一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!