FLINK CDC SQL postgresql to oracle及decoderbufs报错处理

房石阳明i

已于 2023-01-30 16:17:27 修改

阅读量1.4k

点赞数

文章标签： flink sql postgresql java oracle

于 2023-01-30 16:12:13 首次发布

本文链接：https://blog.csdn.net/Mogeko1/article/details/128728301

版权

FLINK SQL postgresql to oracle

flink-cdc实例官网Oracle CDC Connector — CDC Connectors for Apache Flink® documentation Postgres CDC Connector — CDC Connectors for Apache Flink® documentation Oracle CDC Connector — CDC Connectors for Apache Flink® documentation

更改配置文件postgresql.conf
我的是在 /var/lib/pgsql/10/data/postgresql.conf
# 更改wal日志方式为logical
wal_level = logical # minimal, replica, or logical

# 更改solts最大数量（默认值为10），flink-cdc默认一张表占用一个slots
max_replication_slots = 20 # max number of replication slots

# 更改wal发送最大进程数（默认值为10），这个值和上面的solts设置一样
max_wal_senders = 20 # max number of walsender processes
# 中断那些停止活动超过指定毫秒数的复制连接，可以适当设置大一点（默认60s）
wal_sender_timeout = 180s # in milliseconds; 0 disable　　
（3）注意
wal_level是必须更改的，其它参数选着性更改，如果同步表数量超过10张建议修改为合适的值
更改配置文件postgresql.conf完成，需要重启pg服务生效，所以一般是在业务低峰期更改

1.创建postgresql 接收器

CREATE TABLE flink_pg(
ID INT,
PRIMARY KEY (ID) NOT ENFORCED
)WITH(
'connector' = 'postgres-cdc'
,'hostname' = 'PG库_IP地址'
,'port' = '5432'
,'username' = 'postgres'
,'password' = '123456'
,'database-name' = 'postgres'
,'schema-name'='public'
,'table-name' = 'sink2'
,'decoding.plugin.name'='pgoutput' //这个很关键，没这个参数数据出不来，后面讲都有哪些值
,'debezium.slot.name'='slot_2'); // 这个很关键，要不你建立多个会报错，程序启动不了

2. 错误: 无法访问文件 “decoderbufs”: 没有那个文件或目录

安装postgres-decoderbufs 环境（必须）

1.升级GCC/G++
强烈推荐使用devtoolset方式,传统编译方式安装极为耗时且成功率极低.

[root@localhost ~]# sudo yum install devtoolset-4-gcc devtoolset-4-gcc++ devtoolset-4-gcc-c++ -y
[root@localhost ~]# scl enable devtoolset-4 bash
[root@localhost ~]# echo "source /opt/rh/devtoolset-4/enable" >> /etc/profile
[root@localhost ~]# source /etc/profile

2.devtoolset-4-gcc 无法安装

centos-release-scl的镜像，发现官方不提供devtoolset-4

linux>wget https://copr.fedoraproject.org/coprs/hhorak/devtoolset-4-rebuild-bootstrap/repo/epel-7/hhorak-devtoolset-4-rebuild-bootstrap-epel-7.repo -O /etc/yum.repos.d/devtools-4.repo

linux>yum install devtoolset-4-gcc devtoolset-4-binutils devtoolset-4-gcc-c++

linux>scl enable devtoolset-4 bash
重新执行上一步即可

3.安装依赖环境

linux>yum install autoconf automake libtool readline-devel zlib-devel libxslt-devel json-c-devel pcre-devel unzip -y

4.升级autoconf(版本>=2.6.4)

[root@localhost ~]# wget ftp://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.gz
[root@localhost ~]# tar zxvf autoconf-2.69.tar.gz 
[root@localhost ~]# cd autoconf-2.69
[root@localhost ~]# ./configure --prefix=/usr/  
[root@localhost ~]# make && make install
[root@localhost ~]# /usr/bin/autoconf -V 
autoconf (GNU Autoconf) 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+/Autoconf: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>, <http://gnu.org/licenses/exceptions.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

5.下载并编译安装一下依赖环境

linux>wget http://download.osgeo.org/geos/geos-3.6.2.tar.bz2
linux>wget http://download.osgeo.org/proj/proj-4.9.3.tar.gz
linux>wget http://download.osgeo.org/gdal/2.2.3/gdal-2.2.3.tar.gz
linux>wget https://download.osgeo.org/postgis/source/postgis-2.3.7.tar.gz
*.以上四个安装包编译时不需要指定 --prefix=
linux>tar -jxvf geos-3.6.2.tar.bz2
linux>cd geos-3.6.2
linux>./configure
linux>make && make install

postgis编译时需要指定pg_config位置
linux>find / -name pg_config
linux>./configure --with-pgconfig=/usr/pgsql-10/bin/pg_config

可能需要安装部分依赖
linux>yum install postgresql10-server-dev
linux>yum install postgresql-devel
linux>yum install postgresql-devel10
linux>yum install postgresql10-devel
linux>yum install postgresql10-contrib

可能会缺少部分文件导致无法编译，需要找到并复制到pg的lib
linux>ldd /usr/pgsql-10/lib/postgis-2.3.so
linux>find / -name libgeos_c.so.1
linux>find / -name libproj.so.12
linux>cp /usr/local/lib/libgeos_c.so.1 /usr/pgsql-10/lib/
linux>cp /usr/local/lib/libproj.so.12 /usr/pgsql-10/lib/

6．如果上一步安装不上 pogtgis可以试试以下方法 yum 安装

yum install postgis23_10.x86_64 注意版本

先安装工具包

linux>Yum install  wget net-tools epel-release

然后安装postgis

linux>yum install postgis32_14 postgis32_14-client -y

安装拓展工具

linux>yum install org_fdw10 -y
linux>yum install pgrouting_14 -y

7.在PG中扩展Postgis

postgres=# CREATE EXTENSION postgis;
postgres=# CREATE EXTENSION postgis_topology;
postgres=# CREATE EXTENSION fuzzystrmatch;
postgres=# CREATE EXTENSION postgis_tiger_geocoder;

7.1.验证Postgis安装

(7.2 在PG中扩展Postgis时报错)

(7.2 解决办法)

linux>ldd /usr/pgsql-10/lib/rtpostgis-2.3.so 查看状态 发现 so.20 not found
linux>find / -name libgdal.so.20 查找文件
linux>cp /usr/local/lib/libgdal.so.20 /usr/pgsql-10/lib/ 复制到pg10的lib里面解决

8.安装ProtoBuf 3.3.0

[root@localhost ~]# wget https://github.com/google/protobuf/archive/v3.3.0.tar.gz
[root@localhost ~]# ./autogen.sh && ./configure --prefix=/usr/local/protobuf --libdir=/usr/lib64
[root@localhost ~]# make & make install
[root@localhost ~]# ldconfig
[root@localhost ~]# echo "export PATH=$PATH:/usr/local/protobuf/bin" >> /etc/profile
[root@localhost ~]# protoc --version

8.1．提前安装环境 否则ProtoBuf 3.3.0安装不了

由于谷歌被墙，这个可能安装困难，vim autogen.sh看一下里面是需要下载两个压缩包googlemock-release-1.7.0.zip，googletest-release-1.7.0.zip，手动下载，手动跟着这个文件的步骤，准备好再编译

linux>curl $curlopts -L -O https://github.com/google/googlemock/archive/release-1.7.0.zip
linux>unzip -q release-1.7.0.zip
linux>rm release-1.7.0.zip
linux>mv googlemock-release-1.7.0 gmock

linux>curl $curlopts -L -O https://github.com/google/googletest/archive/release-1.7.0.zip
linux>unzip -q release-1.7.0.zip
linux>rm release-1.7.0.zip
linux>mv googletest-release-1.7.0 gmock/gtest

9．安装ProtoBuf-C 1.2.1

[root@localhost ~]# wget https://github.com/protobuf-c/protobuf-c/archive/v1.2.1.tar.gz
[root@localhost ~]# ./autogen.sh && ./configure --prefix=/usr/local/protobuf-c --libdir=/usr/lib64/
[root@localhost ~]# make & make install

10.安装postgresql-decoderbufs


[postgres@localhost ~]$ wget https://github.com/debezium/postgres-decoderbufs/archive/v0.7.5.tar.gz

如果这个包编译报错试试这个包

GitHub - debezium/postgres-decoderbufs: A PostgreSQL logical decoder output plugin to deliver data as Protocol Buffers, used by Debezium (http://debezium.io). Please log issues in our JIRA at https://issues.jboss.org/projects/DBZ/issues

[postgres@localhost ~]$ tar xzvf v0.7.5.tar.gz 
[postgres@localhost ~]$ make USE_PGXS=1 PG_CONFIG=/usr/pgsql-10/bin/pg_config
[postgres@localhost ~]$ make install USE_PGXS=1 PG_CONFIG=/usr/pgsql-10/bin/pg_config
[postgres@localhost ~]$ cat /var/lib/pgsql/10/data/postgresql.conf
listen_addresses = '*'
shared_preload_libraries = 'decoderbufs'
wal_level = logical
max_wal_senders = 10
wal_keep_segments = 4
max_replication_slots = 4

11.修改 shared_preload_libraries

shared_preload_libraries = 'decoderbufs'

12.重启PG数据库

linux>systemctl stop postgresql-10 停止
linux>systemctl start postgresql-10 启动
linux>systemctl restart postgresql-10 重启
linux>systemctl status postgresql-10 查看状态

13．验证,创建有以下结果无报错则成功

postgres=# select * from pg_create_logical_replication_slot('decoderbufs_demo', 'decoderbufs');

编译报错

解决方法：出现该情况是由于c++编译器的相关package没有安装，在终端上执行：

linux> sudo yum install glibc-headers gcc-c++

依赖报错
解决方法：

linux>sudo rpm -ivh https://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/e/epel-release-7-14.noarch.rpm

flink cdc捕获postgresql数据

1)更改配置文件
需要更改 postgresql.conf

linux>vi postgresql.conf
# 更改wal日志方式为logical
wal_level = logical            # minimal, replica, or logical
# 更改solts最大数量（默认值为10），flink-cdc默认一张表占用一个
slotsmax_replication_slots = 20           # max number of replication slots
# 更改wal发送最大进程数（默认值为10），这个值和上面的solts设置一样
max_wal_senders = 20    # max number of walsender processes
# 中断那些停止活动超过指定毫秒数的复制连接，可以适当设置大一点（默认60s）
wal_sender_timeout = 180s        # in milliseconds; 0 disable

2)注意
注意：wal_level = logical源表的数据修改时，默认的逻辑复制流只包含历史记录的primary key，如果需要输出更新记录的历史记录的所有字段，需要在表级别修改参数：ALTER TABLE tableName REPLICA IDENTITY FULL; 这样才能捕获到源表所有字段更新后的值
3) 将jar包导入flink lib目录
flink-sql-connector-postgres-cdc-2.2.0.jar 到 flink/lib下
4)新建用户并且给用户复制流权限
-- pg新建用户
CREATE USER user WITH PASSWORD 'pwd';
5) 给用户复制流权限
ALTER ROLE user replication;
6) 给用户登录数据库权限
grant CONNECT ON DATABASE test to user;
7)把当前库public下所有表查询权限赋给用户
GRANT SELECT ON ALL TABLES IN SCHEMA public TO user;
8) 发布表
-- 设置发布为true
update pg_publication set puballtables=true where pubname is not null;
-- 把所有表进行发布
CREATE PUBLICATION dbz_publication FOR ALL TABLES;
-- 查询哪些表已经发布
select * from pg_publication_tables;
9) 更改表的复制标识包含更新和删除的值
-- 更改复制标识包含更新和删除之前值
ALTER TABLE test0425 REPLICA IDENTITY FULL;
-- 查看复制标识（为f标识说明设置成功）
select relreplident from pg_class where relname='test0425';

到这一步，设置已经完全可以啦，上面步骤都是必须的

1.flink sql 端 创建postgresql 连接器

CREATE TABLE flink_cdc_source (
   id INT,
   name STRING
 ) WITH (
   'connector' = 'postgres-cdc',
  'hostname' = '192.168.58.201',
   'port' = '5432',
   'database-name' = 'postgres',
   'schema-name' = 'public',
  'username' = 'postgres',
   'password' = '123456',
  'table-name' = 'pg_cdc_source',
   'decoding.plugin.name' = 'pgoutput'
 );

2.错误: 复制槽名 "flink" 已经存在

（2.1 解决复制槽名 "flink" 已经存在）

1.切换用户

# su - postgres

2.登陆用户

-bash-4.2$ psql -U postgres

3. 查看复制槽

postgres=# select * from pg_replication_slots; 查看复制槽

4. 删除复制槽

SELECT * FROM pg_drop_replication_slot('flink'); 删除复制槽

5.验证

postgres=# select * from pg_replication_slots; 查看复制槽

flink sql 端 创建oracle 接收器

create table flink_cdc_sink (
ID INT,
NAME STRING
)with(
'connector' = 'jdbc',
'url' = 'jdbc:oracle:thin:@192.168.58.202:1521:ORA19C',
'username' = 'flinkuser',
 'password' = 'flinkpw', 
'table-name' = 'TEST2',
 'driver' = 'oracle.jdbc.driver.OracleDriver');

报错

jdbc 连接oracle错误处理
解决方法：目前flink 1.14不支持jdbc 连接oracle 需要安装 flink 1.15 处理
Flink 1.15 安装 需要使用java11

1.官网下载java 11
https://www.oracle.com/java/technologies/downloads/#java11

2.解压 jdk tar 包

linux>tar -xzvf jdk-11.0.15.1_linux-x64_bin.tar.gz

3.修改环境配置文件

linux>vim /etc/profile
# Java11环境变量配置
JAVA_HOME=/devtools/java/java11/jdk-11.0.15
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=$JAVA_HOME/lib
export JAVA_HOME CLASSPATH PATH

# Java8环境变量配置
JAVA_HOME=/devtools/java/java8/jdk1.8.0_321
PATH=$PATH:$JAVA_HOME/bin:$PATH
CLASSPATH=$JAVA_HOME/lib
export JAVA_HOME PATH CLASSPATH

4.重启电脑生效

5.下载flink 1.15

linux>Wget https://dlcdn.apache.org/flink/flink-1.15.0/flink-1.15.0-bin-scala_2.12.tgz

6.配置 flink 1.15

vim conf/flink-conf.yaml
jobmanager.rpc.address: jobIP地址
# 配置high-availability mode
high-availability: zookeeper
# JobManager的meta信息放在dfs，在zk上主要会保存一个指向dfs路径的指针 
high-availability.storageDir: hdfs://cluster/flinkha/
# 配置zookeeper quorum（hostname和端口需要依据对应zk的实际配置）
high-availability.zookeeper.quorum: IPA:2181,IPB:2181,IPC:2181 
# （可选）设置zookeeper的root目录
#high-availability.zookeeper.path.root: /test_dir/test_standalone2_root
# 注释以下配置
# jobmanager.bind-host: localhost
# taskmanager.bind-host: localhost
#taskmanager.host: localhost
#rest.address: localhost
#rest.bind-address: localhost

#配置yarn 高可用重试次数
yarn.application-attempts: 10

注意:必须要操作上面的“注释以下配置” 否则Web UI 访问不了 其余配置一样，可以参考最上面的搭建。

Flink CDC 实现Postgres变更捕获 （java）

package pg;

import com.ververica.cdc.connectors.postgres.PostgreSQLSource;
import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;

import java.util.Properties;

public class FlinkCdcPg {
    public static void main(String[] args) throws Exception {
        Properties properties = new Properties();
        properties.setProperty("snapshot.mode", "initial");
        properties.setProperty("decimal.handling.mode", "double"); //debezium 小数转换处理策略
        properties.setProperty("database.serverTimezone", "GMT+8"); //debezium 配置以database. 开头的属性将被传递给jdbc url

        SourceFunction<String> sourceFunction = PostgreSQLSource.<String>builder()
                .hostname("192.168.58.201")
                .port(5432)
                .database("postgres") // monitor postgres database
                .schemaList("public")  // monitor inventory schema
                .tableList("public.sink2") // monitor products table
                .username("postgres")
                .password("123456")
                .decodingPluginName("pgoutput") // pg解码插件
                .slotName("t_table_slot") // 复制槽名称 不能重复
                .deserializer(new JsonDebeziumDeserializationSchema()) // converts SourceRecord to JSON String
                .debeziumProperties(properties)
                .build();

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        env
                .addSource(sourceFunction)
                .print().setParallelism(1); // use parallelism 1 for sink to keep message ordering

        env.execute();

    }
}

Flink SQL TABLE pg读取

package pg;

import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.TableResult;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;


public class FlinkCdcOracleExample {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        env.disableOperatorChaining();

        StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);

        String sourceDDL ="CREATE TABLE pg_source (\n" +
                "     ID INT, \n" +
                "     PRIMARY KEY (ID) NOT ENFORCED \n" +
                "     ) WITH (\n" +
                "     'connector' = 'postgres-cdc',\n" +
                "     'hostname' = '192.168.58.201',\n" +
                "     'port' = '5432',\n" +
                "     'username' = 'postgres',\n" +
                "     'password' = '123456',\n" +
                "     'database-name' = 'postgres',\n" +
                "     'schema-name' = 'public',\n" +           // 注意这里要大写
                "     'table-name' = 'sink2',\n" +
		"     'debezium.log.mining.strategy'='online_catalog'\n" +
)";

        //执行source表ddl
        tableEnv.executeSql(sourceDDL);
        TableResult tableResult = tableEnv.executeSql("select * from pg_source");
        tableResult.print();
        env.execute();
    }
}