KingbaseESV8.6变化数据捕捉(CDC)
基于CDC(变更数据捕捉)的增量数据集成总体步骤:
1.捕获源数据库中的更改数据
2.将变更的数据转换为您的消费者可以接受的格式
3.将数据发布到消费者或目标数据库
KingbaseES和PG类似,也支持触发器(trigger)和预写日志(WAL)两种CDC机制。
KingbaseES V8.6 开启WAL日志&自定归档:
1.修改配置:
vi /opt/Kingbase/ES/V8/data/kingbase.conf
wal_level = logical
max_replication_slots = 10
max_wal_senders = 20
archive_mode = on
archive_command = 'test -f /opt/Kingbase/ES/V8/data/pg_archive/archive_active && test ! -f /opt/Kingbase/ES/V8/data/pg_archive/%f && cp %p /opt/Kingbase/ES/V8/data/pg_archive/%f'
2.创建归档目录&授权:
mkdir -p /opt/Kingbase/ES/V8/data/pg_archive
touch /opt/Kingbase/ES/V8/data/pg_archive/archive_active #为了方便控制、配合归档命令设置了归档开关,通过该文件是否存在来控制是否执行归档
chown -R kingbase:kingbase /opt/Kingbase/ES/V8/data/pg_archive
3.重启数据库
service kingbase8d restart
# su - kingbase
$ ksql -Usystem -dtest
test=# show archive_mode;
archive_mode
--------------
on
test=# show archive_command;
archive_command
----------------------------------------------------------------------------------------------------------------------------------------------------------------
test -f /opt/Kingbase/ES/V8/data/pg_archive/archive_active && test ! -f /opt/Kingbase/ES/V8/data/pg_archive/%f && cp %p /opt/Kingbase/ES/V8/data/pg_archive/%f
test=#\q
4.为了使用方便,设置数据目录对应的环境变量KBDATA
# su - kingbase
$ vi .bash_profile
export KBDATA=/opt/Kingbase/ES/V8/data
$ source .bash_profile
$ echo $KBDATA
/opt/Kingbase/ES/V8/data
5.验证归档效果
$ ls -l $KBDATA/pg_archive && echo ">>>>" && ls -l $KBDATA/sys_wal
total 0
-rw-rw-r-- 1 kingbase kingbase 0 Apr 6 18:45 archive_active
>>>>
total 16384
-rw------- 1 kingbase kingbase 16777216 Apr 6 18:51 000000010000000000000001
drwx------ 2 kingbase kingbase 6 Apr 4 17:32 archive_status
[kingbase@localhost ~]$
$ sys_controldata $KBDATA
sys_control version number: 1201
Catalog version number: 202202151
Database system identifier: 7218125339255768305
Database cluster state: in production
sys_control last modified: Thu 06 Apr 2023 06:50:58 PM CST
Latest checkpoint location: 0/17A4878
Latest checkpoint's REDO location: 0/17A4848
Latest checkpoint's REDO WAL file: 000000010000000000000001
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:571
Latest checkpoint's NextOID: 16407
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 513
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 571
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Thu 06 Apr 2023 06:50:58 PM CST
......
auth method mode: 0
[kingbase@localhost ~]$
以system超级用户登录执行checkpoint & 切换日志文件(必须superuser才能执行checkpoint)
test=# checkpoint;
test=# select pg_switch_wal();
pg_switch_wal
---------------
0/17A4A10
(1 row)
test=# \q
查看信息
[kingbase@localhost ~]$ sys_controldata $KBDATA
sys_control version number: 1201
Catalog version number: 202202151
Database system identifier: 7218125339255768305
Database cluster state: in production
sys_control last modified: Thu 06 Apr 2023 07:18:13 PM CST
Latest checkpoint location: 0/2000088
Latest checkpoint's REDO location: 0/2000058
Latest checkpoint's REDO WAL file: 000000010000000000000002
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:571
Latest checkpoint's NextOID: 16407
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 513
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 571
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Thu 06 Apr 2023 07:18:13 PM CST
......
auth method mode: 0
[kingbase@localhost ~]$
可以看到 REDO WAL file 已经切换到 000000010000000000000002 了。
查看wal日志目录和归档目录,发现 wal日志文件 000000010000000000000001 已经完成归档:
[kingbase@localhost ~]$ ls -l $KBDATA/pg_archive && echo ">>>>" && ls -l $KBDATA/sys_wal
total 16384
-rw------- 1 kingbase kingbase 16777216 Apr 6 19:18 000000010000000000000001
-rw-rw-r-- 1 kingbase kingbase 0 Apr 6 18:45 archive_active
>>>>
total 32768
-rw------- 1 kingbase kingbase 16777216 Apr 6 19:18 000000010000000000000002
-rw------- 1 kingbase kingbase 16777216 Apr 6 19:18 000000010000000000000003
drwx------ 2 kingbase kingbase 6 Apr 6 19:18 archive_status
[kingbase@localhost ~]$
KingbaseES WAL日志及LSN查询常用命令:基本和 PostgreSQL 一致,可以参考:
《PostgreSQL 常用命令及常见配置参数》
接下来的CDC的测试步骤和PostgreSQL的CDC类似,详情可以参考:
《PostgreSQL 变化数据捕捉(CDC)》
# su - kingbase
$ ksql -Utest -dtest
切换到超级用户给test用户授予流复制权限(创建逻辑复制插槽需要对应权限)
test=# \c - system
test=# \conninfo
test=# ALTER ROLE test REPLICATION; --流复制权限
ALTER ROLE
test=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------------------+-----------
sao | No inheritance | {}
sso | No inheritance | {}
system | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
test | Replication | {}
test=#
切回到test普通用户
test=# \c - test
You are now connected to database "test" as user "test".
test=> \conninfo
You are connected to database "test" as user "test" via socket in "/tmp" at port "54321".
test=>
接下来进行CDC测试:
create table public.test03(id integer not null, info varchar(100), primary key(id));
step1: 为需要同步的数据库(db)创建逻辑复制插槽(replication slot)
test=> SELECT pg_create_logical_replication_slot('replication_slot01', 'test_decoding'); -- 使用 test_decoding 输出插件
pg_create_logical_replication_slot
------------------------------------
(replication_slot01,0/20181A8)
(1 row)
SELECT * FROM pg_replication_slots;
如果使用pgoutput输出插件:
SELECT pg_create_logical_replication_slot('replication_slot01', 'pgoutput'); -- 使用 pgoutput 输出插件
ERROR: could not access file "pgoutput": No such file or directory
注意:KingbaseES 不支持 pgoutput,但有其他的应对方案,这里不详细介绍了。
-- 如果要销毁则执行:
-- SELECT pg_drop_replication_slot('replication_slot01');
step2:
-- DROP PUBLICATION IF EXISTS pub01;
CREATE PUBLICATION pub01 FOR TABLE public.test03;
-- 支持多种方式创建:
-- CREATE PUBLICATION pub01;
-- CREATE PUBLICATION pub01 FOR TABLE test01;
-- CREATE PUBLICATION pub01 FOR TABLE public.test01, public.test02;
-- CREATE PUBLICATION pub01 FOR TABLE public.test01, public.test02;
-- CREATE PUBLICATION pub01 FOR ALL TABLES;
-- CREATE PUBLICATION pub01 FOR TABLE public.test01 WITH (publish = 'INSERT, UPDATE, DELETE');
select * from pg_publication;
等同于
\dRp
test=> CREATE PUBLICATION pub01 FOR TABLE public.test03;
CREATE PUBLICATION
test=> select * from pg_publication;
oid | pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate
-------+---------+----------+--------------+-----------+-----------+-----------+-------------
16412 | pub01 | 16396 | f | t | t | t | t
(1 row)
test=> \dRp
List of publications
Name | Owner | All tables | Inserts | Updates | Deletes | Truncates
-------+-------+------------+---------+---------+---------+-----------
pub01 | test | f | t | t | t | t
(1 row)
如果想要看到发布关联的表
select * from pg_publication_tables;
等同于
\dRp+
创建发布以后,后续可以修改发布来添加删除表
ALTER PUBLICATION pub01 DROP TABLE public.test02;
验证指定的表是否在发布中:
SELECT * FROM pg_publication_tables WHERE pubname='pub01';
查看逻辑复制效果:
for test_decoding:
SELECT * FROM pg_logical_slot_peek_changes('replication_slot01', NULL, NULL); -- 只查询不删数据
SELECT * FROM pg_logical_slot_get_changes('replication_slot01', NULL, NULL); -- 查询并删除数据
test=> SELECT * FROM pg_logical_slot_peek_changes('replication_slot01', NULL, NULL); -- 只查询不删数据
lsn | xid | data
-----------+-----+------------
0/20181A8 | 573 | BEGIN 573
0/20190B0 | 573 | COMMIT 573
(2 rows)
表数据操作:
insert into public.test03(id,info) values (1,'aaa');
insert into public.test03(id,info) values (2,'bbb'),(3,'ccc');
delete from public.test03 where id=3;
再查看CDC数据:
test=> SELECT * FROM pg_logical_slot_peek_changes('replication_slot01', NULL, NULL); -- 只查询不删数据
lsn | xid | data
-----------+-----+----------------------------------------------------------------
0/20181A8 | 573 | BEGIN 573
0/20190B0 | 573 | COMMIT 573
0/20190E0 | 574 | BEGIN 574
0/20190E0 | 574 | table public.test03: INSERT: id[integer]:1 info[varchar]:'aaa'
0/2019220 | 574 | COMMIT 574
0/2019220 | 575 | BEGIN 575
0/2019220 | 575 | table public.test03: INSERT: id[integer]:2 info[varchar]:'bbb'
0/20192A0 | 575 | table public.test03: INSERT: id[integer]:3 info[varchar]:'ccc'
0/2019350 | 575 | COMMIT 575
0/2019350 | 576 | BEGIN 576
0/2019350 | 576 | table public.test03: DELETE: id[integer]:3
0/20193C0 | 576 | COMMIT 576
(12 rows)
test=>
SELECT * FROM pg_replication_slots;
test=> SELECT * FROM pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
--------------------+---------------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
replication_slot01 | test_decoding | logical | 13832 | test | f | f | | | 573 | 0/2018178 | 0/20181A8
(1 row)
SELECT * FROM pg_logical_slot_get_changes('replication_slot01', NULL, NULL); -- 查询并删除数据
for pgoutput(注意:KingbaseES 不支持 pgoutput,但有其他的应对方案,这里不详细介绍了)
-- SELECT * FROM pg_logical_slot_peek_binary_changes('replication_slot01', null, null, 'proto_version', '1', 'publication_names', 'pub01'); -- 查询但不删除数据
-- SELECT * FROM pg_logical_slot_get_binary_changes('replication_slot01', null, null, 'proto_version', '1', 'publication_names', 'pub01'); -- 查询并删除数据
删除发布:
DROP PUBLICATION IF EXISTS pub01;
销毁逻辑复制插槽
SELECT pg_drop_replication_slot('replication_slot01');