FLINK SQL postgresql to oracle
flink-cdc实例官网Oracle CDC Connector — CDC Connectors for Apache Flink® documentationPostgres CDC Connector — CDC Connectors for Apache Flink® documentationOracle CDC Connector — CDC Connectors for Apache Flink® documentation
更改配置文件postgresql.conf 我的是在 /var/lib/pgsql/10/data/postgresql.conf # 更改wal日志方式为logical wal_level = logical # minimal, replica, or logical # 更改solts最大数量(默认值为10),flink-cdc默认一张表占用一个slots max_replication_slots = 20 # max number of replication slots # 更改wal发送最大进程数(默认值为10),这个值和上面的solts设置一样 max_wal_senders = 20 # max number of walsender processes # 中断那些停止活动超过指定毫秒数的复制连接,可以适当设置大一点(默认60s) wal_sender_timeout = 180s # in milliseconds; 0 disable (3)注意 wal_level是必须更改的,其它参数选着性更改,如果同步表数量超过10张建议修改为合适的值 更改配置文件postgresql.conf完成,需要重启pg服务生效,所以一般是在业务低峰期更改 1.创建postgresql 接收器
CREATE TABLE flink_pg(
ID INT,
PRIMARY KEY (ID) NOT ENFORCED
)WITH(
'connector' = 'postgres-cdc'
,'hostname' = 'PG库_IP地址'
,'port' = '5432'
,'username' = 'postgres'
,'password' = '123456'
,'database-name' = 'postgres'
,'schema-name'='public'
,'table-name' = 'sink2'
,'decoding.plugin.name'='pgoutput' //这个很关键,没这个参数数据出不来,后面讲都有哪些值
,'debezium.slot.name'='slot_2'); // 这个很关键,要不你建立多个会报错,程序启动不了
2. 错误: 无法访问文件 “decoderbufs”: 没有那个文件或目录
安装postgres-decoderbufs 环境(必须)
1.升级GCC/G++ 强烈推荐使用devtoolset方式,传统编译方式安装极为耗时且成功率极低.
[root@localhost ~]# sudo yum install devtoolset-4-gcc devtoolset-4-gcc++ devtoolset-4-gcc-c++ -y
[root@localhost ~]# scl enable devtoolset-4 bash
[root@localhost ~]# echo "source /opt/rh/devtoolset-4/enable" >> /etc/profile
[root@localhost ~]# source /etc/profile
2.devtoolset-4-gcc 无法安装
centos-release-scl
的镜像,发现官方不提供devtoolset-4
linux>wget https://copr.fedoraproject.org/coprs/hhorak/devtoolset-4-rebuild-bootstrap/repo/epel-7/hhorak-devtoolset-4-rebuild-bootstrap-epel-7.repo -O /etc/yum.repos.d/devtools-4.repo
linux>yum install devtoolset-4-gcc devtoolset-4-binutils devtoolset-4-gcc-c++
linux>scl enable devtoolset-4 bash
重新执行上一步即可
3.安装依赖环境
linux>yum install autoconf automake libtool readline-devel zlib-devel libxslt-devel json-c-devel pcre-devel unzip -y
4.升级autoconf(版本>=2.6.4)
[root@localhost ~]# wget ftp://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.gz
[root@localhost ~]# tar zxvf autoconf-2.69.tar.gz
[root@localhost ~]# cd autoconf-2.69
[root@localhost ~]# ./configure --prefix=/usr/
[root@localhost ~]# make && make install
[root@localhost ~]# /usr/bin/autoconf -V
autoconf (GNU Autoconf) 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+/Autoconf: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>, <http://gnu.org/licenses/exceptions.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
5.下载并编译安装一下依赖环境
linux>wget http://download.osgeo.org/geos/geos-3.6.2.tar.bz2
linux>wget http://download.osgeo.org/proj/proj-4.9.3.tar.gz
linux>wget http://download.osgeo.org/gdal/2.2.3/gdal-2.2.3.tar.gz
linux>wget https://download.osgeo.org/postgis/source/postgis-2.3.7.tar.gz
*.以上四个安装包编译时不需要指定 --prefix=
linux>tar -jxvf geos-3.6.2.tar.bz2
linux>cd geos-3.6.2
linux>./configure
linux>make && make install
postgis编译时需要指定pg_config位置
linux>find / -name pg_config
linux>./configure --with-pgconfig=/usr/pgsql-10/bin/pg_config
可能需要安装部分依赖
linux>yum install postgresql10-server-dev
linux>yum install postgresql-devel
linux>yum install postgresql-devel10
linux>yum install postgresql10-devel
linux>yum install postgresql10-contrib
可能会缺少部分文件导致无法编译,需要找到并复制到pg的lib
linux>ldd /usr/pgsql-10/lib/postgis-2.3.so
linux>find / -name libgeos_c.so.1
linux>find / -name libproj.so.12
linux>cp /usr/local/lib/libgeos_c.so.1 /usr/pgsql-10/lib/
linux>cp /usr/local/lib/libproj.so.12 /usr/pgsql-10/lib/
6.如果上一步安装不上 pogtgis可以试试以下方法 yum 安装
yum install postgis23_10.x86_64 注意版本
- 先安装工具包
linux>Yum install wget net-tools epel-release
- 然后安装postgis
linux>yum install postgis32_14 postgis32_14-client -y
- 安装拓展工具
linux>yum install org_fdw10 -y linux>yum install pgrouting_14 -y
7.在PG中扩展Postgis
postgres=# CREATE EXTENSION postgis;
postgres=# CREATE EXTENSION postgis_topology;
postgres=# CREATE EXTENSION fuzzystrmatch;
postgres=# CREATE EXTENSION postgis_tiger_geocoder;
7.1.验证Postgis安装
(7.2 在PG中扩展Postgis时报错)
(7.2 解决办法)
linux>ldd /usr/pgsql-10/lib/rtpostgis-2.3.so 查看状态 发现 so.20 not found
linux>find / -name libgdal.so.20 查找文件
linux>cp /usr/local/lib/libgdal.so.20 /usr/pgsql-10/lib/ 复制到pg10的lib里面解决
8.安装ProtoBuf 3.3.0
[root@localhost ~]# wget https://github.com/google/protobuf/archive/v3.3.0.tar.gz
[root@localhost ~]# ./autogen.sh && ./configure --prefix=/usr/local/protobuf --libdir=/usr/lib64
[root@localhost ~]# make & make install
[root@localhost ~]# ldconfig
[root@localhost ~]# echo "export PATH=$PATH:/usr/local/protobuf/bin" >> /etc/profile
[root@localhost ~]# protoc --version
8.1.提前安装环境 否则ProtoBuf 3.3.0安装不了
由于谷歌被墙,这个可能安装困难,vim autogen.sh看一下里面是需要下载两个压缩包googlemock-release-1.7.0.zip,googletest-release-1.7.0.zip,手动下载,手动跟着这个文件的步骤,准备好再编译
linux>curl $curlopts -L -O https://github.com/google/googlemock/archive/release-1.7.0.zip
linux>unzip -q release-1.7.0.zip
linux>rm release-1.7.0.zip
linux>mv googlemock-release-1.7.0 gmock
linux>curl $curlopts -L -O https://github.com/google/googletest/archive/release-1.7.0.zip
linux>unzip -q release-1.7.0.zip
linux>rm release-1.7.0.zip
linux>mv googletest-release-1.7.0 gmock/gtest
9.安装ProtoBuf-C 1.2.1
[root@localhost ~]# wget https://github.com/protobuf-c/protobuf-c/archive/v1.2.1.tar.gz
[root@localhost ~]# ./autogen.sh && ./configure --prefix=/usr/local/protobuf-c --libdir=/usr/lib64/
[root@localhost ~]# make & make install
10.安装postgresql-decoderbufs
[postgres@localhost ~]$ wget https://github.com/debezium/postgres-decoderbufs/archive/v0.7.5.tar.gz
如果这个包编译报错试试这个包
GitHub - debezium/postgres-decoderbufs: A PostgreSQL logical decoder output plugin to deliver data as Protocol Buffers, used by Debezium (http://debezium.io). Please log issues in our JIRA at https://issues.jboss.org/projects/DBZ/issues
[postgres@localhost ~]$ tar xzvf v0.7.5.tar.gz
[postgres@localhost ~]$ make USE_PGXS=1 PG_CONFIG=/usr/pgsql-10/bin/pg_config
[postgres@localhost ~]$ make install USE_PGXS=1 PG_CONFIG=/usr/pgsql-10/bin/pg_config
[postgres@localhost ~]$ cat /var/lib/pgsql/10/data/postgresql.conf
listen_addresses = '*'
shared_preload_libraries = 'decoderbufs'
wal_level = logical
max_wal_senders = 10
wal_keep_segments = 4
max_replication_slots = 4
11.修改 shared_preload_libraries
shared_preload_libraries = 'decoderbufs'
12.重启PG数据库
linux>systemctl stop postgresql-10 停止
linux>systemctl start postgresql-10 启动
linux>systemctl restart postgresql-10 重启
linux>systemctl status postgresql-10 查看状态
13.验证,创建有以下结果无报错则成功
postgres=# select * from pg_create_logical_replication_slot('decoderbufs_demo', 'decoderbufs');
编译报错
解决方法:出现该情况是由于c++编译器的相关package没有安装,在终端上执行:
linux> sudo yum install glibc-headers gcc-c++
依赖报错 解决方法:
linux>sudo rpm -ivh https://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/e/epel-release-7-14.noarch.rpm
flink cdc捕获postgresql数据
1)更改配置文件 需要更改 postgresql.conf
linux>vi postgresql.conf
# 更改wal日志方式为logical
wal_level = logical # minimal, replica, or logical
# 更改solts最大数量(默认值为10),flink-cdc默认一张表占用一个
slotsmax_replication_slots = 20 # max number of replication slots
# 更改wal发送最大进程数(默认值为10),这个值和上面的solts设置一样
max_wal_senders = 20 # max number of walsender processes
# 中断那些停止活动超过指定毫秒数的复制连接,可以适当设置大一点(默认60s)
wal_sender_timeout = 180s # in milliseconds; 0 disable
2)注意 注意:wal_level = logical源表的数据修改时,默认的逻辑复制流只包含历史记录的primary key,如果需要输出更新记录的历史记录的所有字段,需要在表级别修改参数:ALTER TABLE tableName REPLICA IDENTITY FULL; 这样才能捕获到源表所有字段更新后的值 3) 将jar包导入flink lib目录 flink-sql-connector-postgres-cdc-2.2.0.jar 到 flink/lib下 4)新建用户并且给用户复制流权限 -- pg新建用户 CREATE USER user WITH PASSWORD 'pwd'; 5) 给用户复制流权限 ALTER ROLE user replication; 6) 给用户登录数据库权限 grant CONNECT ON DATABASE test to user; 7)把当前库public下所有表查询权限赋给用户 GRANT SELECT ON ALL TABLES IN SCHEMA public TO user; 8) 发布表 -- 设置发布为true update pg_publication set puballtables=true where pubname is not null; -- 把所有表进行发布 CREATE PUBLICATION dbz_publication FOR ALL TABLES; -- 查询哪些表已经发布 select * from pg_publication_tables; 9) 更改表的复制标识包含更新和删除的值 -- 更改复制标识包含更新和删除之前值 ALTER TABLE test0425 REPLICA IDENTITY FULL; -- 查看复制标识(为f标识说明设置成功) select relreplident from pg_class where relname='test0425'; 到这一步,设置已经完全可以啦,上面步骤都是必须的
1.flink sql 端 创建postgresql 连接器
CREATE TABLE flink_cdc_source (
id INT,
name STRING
) WITH (
'connector' = 'postgres-cdc',
'hostname' = '192.168.58.201',
'port' = '5432',
'database-name' = 'postgres',
'schema-name' = 'public',
'username' = 'postgres',
'password' = '123456',
'table-name' = 'pg_cdc_source',
'decoding.plugin.name' = 'pgoutput'
);
2.错误: 复制槽名 "flink" 已经存在
(2.1 解决复制槽名 "flink" 已经存在)
1.切换用户
# su - postgres
2.登陆用户
-bash-4.2$ psql -U postgres
3. 查看复制槽
postgres=# select * from pg_replication_slots; 查看复制槽
4. 删除复制槽
SELECT * FROM pg_drop_replication_slot('flink'); 删除复制槽
5.验证
postgres=# select * from pg_replication_slots; 查看复制槽
flink sql 端 创建oracle 接收器
create table flink_cdc_sink (
ID INT,
NAME STRING
)with(
'connector' = 'jdbc',
'url' = 'jdbc:oracle:thin:@192.168.58.202:1521:ORA19C',
'username' = 'flinkuser',
'password' = 'flinkpw',
'table-name' = 'TEST2',
'driver' = 'oracle.jdbc.driver.OracleDriver');
报错
jdbc 连接oracle错误处理 解决方法:目前flink 1.14不支持jdbc 连接oracle 需要安装 flink 1.15 处理 Flink 1.15 安装 需要使用java11
1.官网下载java 11 https://www.oracle.com/java/technologies/downloads/#java11
2.解压 jdk tar 包
linux>tar -xzvf jdk-11.0.15.1_linux-x64_bin.tar.gz
3.修改环境配置文件
linux>vim /etc/profile
# Java11环境变量配置
JAVA_HOME=/devtools/java/java11/jdk-11.0.15
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=$JAVA_HOME/lib
export JAVA_HOME CLASSPATH PATH
# Java8环境变量配置
JAVA_HOME=/devtools/java/java8/jdk1.8.0_321
PATH=$PATH:$JAVA_HOME/bin:$PATH
CLASSPATH=$JAVA_HOME/lib
export JAVA_HOME PATH CLASSPATH
4.重启电脑生效
5.下载flink 1.15
linux>Wget https://dlcdn.apache.org/flink/flink-1.15.0/flink-1.15.0-bin-scala_2.12.tgz
6.配置 flink 1.15
vim conf/flink-conf.yaml
jobmanager.rpc.address: jobIP地址
# 配置high-availability mode
high-availability: zookeeper
# JobManager的meta信息放在dfs,在zk上主要会保存一个指向dfs路径的指针
high-availability.storageDir: hdfs://cluster/flinkha/
# 配置zookeeper quorum(hostname和端口需要依据对应zk的实际配置)
high-availability.zookeeper.quorum: IPA:2181,IPB:2181,IPC:2181
# (可选)设置zookeeper的root目录
#high-availability.zookeeper.path.root: /test_dir/test_standalone2_root
# 注释以下配置
# jobmanager.bind-host: localhost
# taskmanager.bind-host: localhost
#taskmanager.host: localhost
#rest.address: localhost
#rest.bind-address: localhost
#配置yarn 高可用重试次数
yarn.application-attempts: 10
注意:必须要操作上面的“注释以下配置” 否则Web UI 访问不了 其余配置一样,可以参考最上面的搭建。
Flink CDC 实现Postgres变更捕获 (java)
package pg;
import com.ververica.cdc.connectors.postgres.PostgreSQLSource;
import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import java.util.Properties;
public class FlinkCdcPg {
public static void main(String[] args) throws Exception {
Properties properties = new Properties();
properties.setProperty("snapshot.mode", "initial");
properties.setProperty("decimal.handling.mode", "double"); //debezium 小数转换处理策略
properties.setProperty("database.serverTimezone", "GMT+8"); //debezium 配置以database. 开头的属性将被传递给jdbc url
SourceFunction<String> sourceFunction = PostgreSQLSource.<String>builder()
.hostname("192.168.58.201")
.port(5432)
.database("postgres") // monitor postgres database
.schemaList("public") // monitor inventory schema
.tableList("public.sink2") // monitor products table
.username("postgres")
.password("123456")
.decodingPluginName("pgoutput") // pg解码插件
.slotName("t_table_slot") // 复制槽名称 不能重复
.deserializer(new JsonDebeziumDeserializationSchema()) // converts SourceRecord to JSON String
.debeziumProperties(properties)
.build();
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env
.addSource(sourceFunction)
.print().setParallelism(1); // use parallelism 1 for sink to keep message ordering
env.execute();
}
}
Flink SQL TABLE pg读取
package pg;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.TableResult;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
public class FlinkCdcOracleExample {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.disableOperatorChaining();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
String sourceDDL ="CREATE TABLE pg_source (\n" +
" ID INT, \n" +
" PRIMARY KEY (ID) NOT ENFORCED \n" +
" ) WITH (\n" +
" 'connector' = 'postgres-cdc',\n" +
" 'hostname' = '192.168.58.201',\n" +
" 'port' = '5432',\n" +
" 'username' = 'postgres',\n" +
" 'password' = '123456',\n" +
" 'database-name' = 'postgres',\n" +
" 'schema-name' = 'public',\n" + // 注意这里要大写
" 'table-name' = 'sink2',\n" +
" 'debezium.log.mining.strategy'='online_catalog'\n" +
)";
//执行source表ddl
tableEnv.executeSql(sourceDDL);
TableResult tableResult = tableEnv.executeSql("select * from pg_source");
tableResult.print();
env.execute();
}
}