Debeizum:Postgres to Kafka
一、环境配置
1、环境清单:
数据源:PostgreSQL 12.4
操作系统:SUSE 12 SP4 * 2台
目标端: kafka
2、安装包准备
源端PG数据库的安装这里就不赘述了,只讲Kafka和debezium。
ls -lrt
-rw-r--r-- 1 piccism nmgroup 68684719 Jan 19 15:36 kafka_2.12-2.7.0.tgz
drwxr-xr-x 7 root root 4096 Jan 19 15:42 kafka_2.12-2.7.0
-rw-r--r-- 1 piccism nmgroup 6110424 Jan 19 16:27 debezium-connector-postgres-1.4.0.Final-plugin.tar.gz
drwxr-xr-x 2 root root 4096 Jan 20 09:09 debezium-connector-postgres
二、KAKFA的安装
1、解压安装包,并进入
cd kafka_2.12-2.7.0/
ls -lrt
total 56
-rw-r--r-- 1 root root 337 Dec 16 21:58 NOTICE
-rw-r--r-- 1 root root 29975 Dec 16 21:58 LICENSE
drwxr-xr-x 2 root root 4096 Dec 16 22:01 site-docs
drwxr-xr-x 3 root root 4096 Jan 19 15:55 bin
drwxr-xr-x 2 root root 4096 Jan 20 09:49 libs
drwxr-xr-x 2 root root 4096 Jan 20 10:35 config
drwxr-xr-x 2 root root 4096 Jan 20 11:02 logs
2、kafka的目录主要有bin/libs/config/logs,所有的命令都在bin目录内,而且新版本的kafka内自带zookeeper,不需要另外安装
ls -lrt
total 424
-rwxr-xr-x 1 root root 1019 Dec 16 21:58 zookeeper-shell.sh
-rwxr-xr-x 1 root root 1366 Dec 16 21:58 zookeeper-server-stop.sh
-rwxr-xr-x 1 root root 1393 Dec 16 21:58 zookeeper-server-start.sh
-rwxr-xr-x 1 root root 867 Dec 16 21:58 zookeeper-security-migration.sh
drwxr-xr-x 2 root root 4096 Dec 16 21:58 windows
-rwxr-xr-x 1 root root 1714 Dec 16 21:58 trogdor.sh
-rwxr-xr-x 1 root root 958 Dec 16 21:58 kafka-verifiable-producer.sh
-rwxr-xr-x 1 root root 958 Dec 16 21:58 kafka-verifiable-consumer.sh
-rwxr-xr-x 1 root root 863 Dec 16 21:58 kafka-topics.sh
-rwxr-xr-x 1 root root 945 Dec 16 21:58 kafka-streams-application-reset.sh
-rwxr-xr-x 1 root root 1361 Dec 16 21:58 kafka-server-stop.sh
-rwxr-xr-x 1 root root 1376 Dec 16 21:58 kafka-server-start.sh
-rwxr-xr-x 1 root root 10072 Dec 16 21:58 kafka-run-class.sh
-rwxr-xr-x 1 root root 874 Dec 16 21:58 kafka-replica-verification.sh
-rwxr-xr-x 1 root root 874 Dec 16 21:58 kafka-reassign-partitions.sh
-rwxr-xr-x 1 root root 959 Dec 16 21:58 kafka-producer-perf-test.sh
-rwxr-xr-x 1 root root 886 Dec 16 21:58 kafka-preferred-replica-election.sh
-rwxr-xr-x 1 root root 862 Dec 16 21:58 kafka-mirror-maker.sh
-rwxr-xr-x 1 root root 863 Dec 16 21:58 kafka-log-dirs.sh
-rwxr-xr-x 1 root root 870 Dec 16 21:58 kafka-leader-election.sh
-rwxr-xr-x 1 root root 863 Dec 16 21:58 kafka-features.sh
-rwxr-xr-x 1 root root 866 Dec 16 21:58 kafka-dump-log.sh
-rwxr-xr-x 1 root root 869 Dec 16 21:58 kafka-delete-records.sh
-rwxr-xr-x 1 root root 871 Dec 16 21:58 kafka-delegation-tokens.sh
-rwxr-xr-x 1 root root 948 Dec 16 21:58 kafka-consumer-perf-test.sh
-rwxr-xr-x 1 root root 871 Dec 16 21:58 kafka-consumer-groups.sh
-rwxr-xr-x 1 root root 944 Dec 16 21:58 kafka-console-producer.sh
-rwxr-xr-x 1 root root 945 Dec 16 21:58 kafka-console-consumer.sh
-rwxr-xr-x 1 root root 864 Dec 16 21:58 kafka-configs.sh
-rwxr-xr-x 1 root root 873 Dec 16 21:58 kafka-broker-api-versions.sh
-rwxr-xr-x 1 root root 861 Dec 16 21:58 kafka-acls.sh
-rwxr-xr-x 1 root root 1420 Dec 16 21:58 connect-standalone.sh
-rwxr-xr-x 1 root root 1396 Dec 16 21:58 connect-mirror-maker.sh
-rwxr-xr-x 1 root root 1423 Dec 16 21:58 connect-distributed.sh
3、验证
启动zookeeper
cd kafka_2.12-2.7.0/bin/
./zookeeper-server-start.sh ../config/zookeeper.properties
启动Kafka server
cd kafka_2.12-2.7.0/bin/
./kafka-server-start.sh ../config/server.properties
查看是否启动
netstat -tunlp|egrep "(2181|9092)"
tcp 0 0 0.0.0.0:9092 0.0.0.0:* LISTEN 15961/java
tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN 7464/java
启动Producer
./kafka-console-producer.sh --broker-list localhost:9092 --topic test
>jamp
>hello
>i
>am
>jampg
启动Consumer
Raneto:/opt/source/kafka_2.12-2.7.0/bin # ./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
[2021-01-20 11:57:10,399] INFO [GroupCoordinator 0]: Preparing to rebalance group console-consumer-18929 in state PreparingRebalance with old generation 0 (__consumer_offsets-10) (reason: Adding new member consumer-console-consumer-18929-1-5764840d-6ad0-4c0a-ac55-6f9f019abadf with group instance id None) (kafka.coordinator.group.GroupCoordinator)
[2021-01-20 11:57:10,400] INFO [GroupCoordinator 0]: Stabilized group console-consumer-18929 generation 1 (__consumer_offsets-10) (kafka.coordinator.group.GroupCoordinator)
[2021-01-20 11:57:10,404] INFO [GroupCoordinator 0]: Assignment received from leader for group console-consumer-18929 for generation 1 (kafka.coordinator.group.GroupCoordinator)
jamp
hello
i
am
jampg
三、安装Debezium
1、解压安装包,并进入
cd debezium-connector-postgres/
ls -lrt
total 6860
-rw-rw-r-- 1 1000 1000 4617 Jan 7 15:35 failureaccess-1.0.1.jar
-rw-rw-r-- 1 1000 1000 2858426 Jan 7 15:36 guava-30.0-jre.jar
-rw-rw-r-- 1 1000 1000 932808 Jan 7 15:37 postgresql-42.2.14.jar
-rw-rw-r-- 1 1000 1000 1634485 Jan 7 15:37 protobuf-java-3.8.0.jar
-rw-rw-r-- 1 1000 1000 13225 Jan 7 16:38 README_ZH.md
-rw-rw-r-- 1 1000 1000 14063 Jan 7 16:38 README.md
-rw-rw-r-- 1 1000 1000 11357 Jan 7 16:38 LICENSE.txt
-rw-rw-r-- 1 1000 1000 129157 Jan 7 16:38 LICENSE-3rd-PARTIES.txt
-rw-rw-r-- 1 1000 1000 3564 Jan 7 16:38 COPYRIGHT.txt
-rw-rw-r-- 1 1000 1000 19376 Jan 7 16:38 CONTRIBUTE.md
-rw-rw-r-- 1 1000 1000 222302 Jan 7 16:38 CHANGELOG.md
-rw-rw-r-- 1 1000 1000 20269 Jan 7 16:38 debezium-api-1.4.0.Final.jar
-rw-rw-r-- 1 1000 1000 818548 Jan 7 16:38 debezium-core-1.4.0.Final.jar
-rw-rw-r-- 1 1000 1000 318318 Jan 7 16:40 debezium-connector-postgres-1.4.0.Final.jar
2、将所有文件拷贝到kafka libs目录下
cp *.jar /opt/source/kafka_2.12-2.7.0/libs/
四、配置
1、源端:修改日志级别为logical
jampg_db=# show wal_level ;
wal_level
-----------
logical
(1 row)
2、kafka端:添加配置文件
vi postgres.properties
name=pub2-connector
connector.class=io.debezium.connector.postgresql.PostgresConnector
database.hostname= 10.15.6.131
#数据库的IP地址,按实际填写
database.port=10001
#数据库的端口,按实际填写
database.user=postgres
#数据库的用户,按实际填写,需要有创建复制槽的权限。
database.password=postgres
#密码,按实际填写
database.dbname=jampg_db
#需要同步的数据库名称,按实际填写
database.history.kafka.bootstrap.servers=localhost:9092
database.server.name=pub2
publication.autocreate.mode=filtered
plugin.name=pgoutput
#重点,如果使用pgoutput,一定要写这两个
参考:
Using PostgreSQL pgoutput Plugin for Change Data Capture With Debezium
3、以独立模式启动kafka connect,此时debezium会对数据库中的每一个表创建一个topic(没有主键的表不会创建)
nohup ./connect-standalone.sh ../config/connect-standalone.properties ../config/postgres.properties &
4、查看topic,命名规则为database.server.name:schema:tablename
./kafka-topics.sh --list --zookeeper localhost:2181
__consumer_offsets
pub2.public.tb1_mvcc
5、插入数据,并启动一个kafka consumer,验证数据是否同步
insert into tb1_mvcc values (7,345);
可以看到,kafka消费端已经出现数据了
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic pub2.public.tb1_mvcc --from-beginning
{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"},{"type":"int32","optional":true,"field":"ival"}],"optional":true,"name":"pub2.public.tb1_mvcc.Value","field":"before"},{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"},{"type":"int32","optional":true,"field":"ival"}],"optional":true,"name":"pub2.public.tb1_mvcc.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":false,"field":"schema"},{"type":"string","optional":false,"field":"table"},{"type":"int64","optional":true,"field":"txId"},{"type":"int64","optional":true,"field":"lsn"},{"type":"int64","optional":true,"field":"xmin"}],"optional":false,"name":"io.debezium.connector.postgresql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"field":"transaction"}],"optional":false,"name":"pub2.public.tb1_mvcc.Envelope"},"payload":{"before":null,"after":{"id":7,"ival":345},"source":{"version":"1.4.0.Final","connector":"postgresql","name":"pub2","ts_ms":1611114506456,"snapshot":"false","db":"jampg_db","schema":"public","table":"tb1_mvcc","txId":756,"lsn":58888874576,"xmin":null},"op":"c","ts_ms":1611114506473,"transaction":null}}
6、同样测试删除、修改操作,均正常同步。
五、结束
关于debezium使用pgoutput的文章很少,主要问题在于如何写postgres.properties文件,把大体过程记录如上。
大部分参考来自于官网,任何技术问题,官方文档是最重要的资料。