Flink CDC SQL Oracle To Kafka To ES

Oracle CDC 连接器

下载flink-sql-connector-oracle-cdc-2.2.1.jar放在<FLINK_HOME>/lib/.

注: flink-sql-connector-oracle-cdc-XXX-SNAPSHOT 版本为开发分支对应的代码。用户需要下载源码并编译相应的jar。用户使用发布版本,如flink-sql-connector-oracle-cdc-2.2.1.jar,发布版本会在Maven中央仓库中获取。

对于非 CDB 数据库

  1. 启用日志归档

    (1.1)。以 DBA 身份连接到数据库

    ORACLE_SID=SID
    export ORACLE_SID
    sqlplus /nolog
      CONNECT sys/password AS SYSDBA

    (1.2)。启用日志归档

    alter system set db_recovery_file_dest_size = 10G;
    alter system set db_recovery_file_dest = '/opt/oracle/oradata/recovery_area' scope=spfile;
    shutdown immediate;
    startup mount;
    alter database archivelog;
    alter database open;

    注意:

    • 启用日志归档需要重启数据库,尝试时注意

    • 归档日志会占用大量磁盘空间,所以考虑定期清理过期日志

    (1.3)。查看日志归档是否开启

    -- Should now "Database log mode: Archive Mode"
    archive log list;

    注意:

    必须为捕获的表或数据库启用补充日志记录,以便数据更改捕获更改的数据库行之前的状态。下面说明如何在表/数据库级别进行配置。

    -- Enable supplemental logging for a specific table:
    ALTER TABLE inventory.customers ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
    
    -- Enable supplemental logging for database
    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
  2. 创建具有权限的 Oracle 用户

    (2.1)。创建表空间

    sqlplus sys/password@host:port/SID AS SYSDBA;
      CREATE TABLESPACE logminer_tbs DATAFILE '/opt/oracle/oradata/SID/logminer_tbs.dbf' SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
      exit;

    (2.2)。创建用户并授予权限

    sqlplus sys/password@host:port/SID AS SYSDBA;
      CREATE USER flinkuser IDENTIFIED BY flinkpw DEFAULT TABLESPACE LOGMINER_TBS QUOTA UNLIMITED ON LOGMINER_TBS;
      GRANT CREATE SESSION TO flinkuser;
      GRANT SET CONTAINER TO flinkuser;
      GRANT SELECT ON V_$DATABASE to flinkuser;
      GRANT FLASHBACK ANY TABLE TO flinkuser;
      GRANT SELECT ANY TABLE TO flinkuser;
      GRANT SELECT_CATALOG_ROLE TO flinkuser;
      GRANT EXECUTE_CATALOG_ROLE TO flinkuser;
      GRANT SELECT ANY TRANSACTION TO flinkuser;
      GRANT LOGMINING TO flinkuser;
    
      GRANT CREATE TABLE TO flinkuser;
      GRANT LOCK ANY TABLE TO flinkuser;
      GRANT ALTER ANY TABLE TO flinkuser;
      GRANT CREATE SEQUENCE TO flinkuser;
    
      GRANT EXECUTE ON DBMS_LOGMNR TO flinkuser;
      GRANT EXECUTE ON DBMS_LOGMNR_D TO flinkuser;
    
      GRANT SELECT ON V_$LOG TO flinkuser;
      GRANT SELECT ON V_$LOG_HISTORY TO flinkuser;
      GRANT SELECT ON V_$LOGMNR_LOGS TO flinkuser;
      GRANT SELECT ON V_$LOGMNR_CONTENTS TO flinkuser;
      GRANT SELECT ON V_$LOGMNR_PARAMETERS TO flinkuser;
      GRANT SELECT ON V_$LOGFILE TO flinkuser;
      GRANT SELECT ON V_$ARCHIVED_LOG TO flinkuser;
      GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO flinkuser;
      exit;

对于 CDB 数据库

总的来说,配置CDB数据库的步骤与非CDB数据库非常相似,但命令可能有所不同。

  1. 启用日志归档

    ORACLE_SID=ORCLCDB
    export ORACLE_SID
    sqlplus /nolog
      CONNECT sys/password AS SYSDBA
      alter system set db_recovery_file_dest_size = 10G;
      -- should exist
      alter system set db_recovery_file_dest = '/opt/oracle/oradata/recovery_area' scope=spfile;
      shutdown immediate
      startup mount
      alter database archivelog;
      alter database open;
      -- Should show "Database log mode: Archive Mode"
      archive log list
      exit;

    注意: 还可以使用以下命令启用补充日志记录:

    -- Enable supplemental logging for a specific table:
    ALTER TABLE inventory.customers ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
    
    -- Enable supplemental logging for database
    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
  2. 创建具有权限的 Oracle 用户

    sqlplus sys/password@//localhost:1521/ORCLCDB as sysdba
      CREATE TABLESPACE logminer_tbs DATAFILE '/opt/oracle/oradata/ORCLCDB/logminer_tbs.dbf' SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
      exit
    sqlplus sys/password@//localhost:1521/ORCLPDB1 as sysdba
      CREATE TABLESPACE logminer_tbs DATAFILE '/opt/oracle/oradata/ORCLCDB/ORCLPDB1/logminer_tbs.dbf' SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
      exit
    sqlplus sys/password@//localhost:1521/ORCLCDB as sysdba
      CREATE USER flinkuser IDENTIFIED BY flinkpw DEFAULT TABLESPACE logminer_tbs QUOTA UNLIMITED ON logminer_tbs CONTAINER=ALL;
      GRANT CREATE SESSION TO flinkuser CONTAINER=ALL;
      GRANT SET CONTAINER TO flinkuser CONTAINER=ALL;
      GRANT SELECT ON V_$DATABASE to flinkuser CONTAINER=ALL;
      GRANT FLASHBACK ANY TABLE TO flinkuser CONTAINER=ALL;
      GRANT SELECT ANY TABLE TO flinkuser CONTAINER=ALL;
      GRANT SELECT_CATALOG_ROLE TO flinkuser CONTAINER=ALL;
      GRANT EXECUTE_CATALOG_ROLE TO flinkuser CONTAINER=ALL;
      GRANT SELECT ANY TRANSACTION TO flinkuser CONTAINER=ALL;
      GRANT LOGMINING TO flinkuser CONTAINER=ALL;
      GRANT CREATE TABLE TO flinkuser CONTAINER=ALL;
      GRANT LOCK ANY TABLE TO flinkuser CONTAINER=ALL;
      GRANT CREATE SEQUENCE TO flinkuser CONTAINER=ALL;
    
      GRANT EXECUTE ON DBMS_LOGMNR TO flinkuser CONTAINER=ALL;
      GRANT EXECUTE ON DBMS_LOGMNR_D TO flinkuser CONTAINER=ALL;
    
      GRANT SELECT ON V_$LOG TO flinkuser CONTAINER=ALL;
      GRANT SELECT ON V_$LOG_HISTORY TO flinkuser CONTAINER=ALL;
      GRANT SELECT ON V_$LOGMNR_LOGS TO flinkuser CONTAINER=ALL;
      GRANT SELECT ON V_$LOGMNR_CONTENTS TO flinkuser CONTAINER=ALL;
      GRANT SELECT ON V_$LOGMNR_PARAMETERS TO flinkuser CONTAINER=ALL;
      GRANT SELECT ON V_$LOGFILE TO flinkuser CONTAINER=ALL;
      GRANT SELECT ON V_$ARCHIVED_LOG TO flinkuser CONTAINER=ALL;
      GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO flinkuser CONTAINER=ALL;
      exit

查看有关设置 Oracle的更多信息

如何创建 Oracle CDC 表

Oracle CDC 表可以定义如下:

-- register an Oracle table 'products' in Flink SQL
Flink SQL> CREATE TABLE products (
     ID INT NOT NULL,
     NAME STRING,
     DESCRIPTION STRING,
     WEIGHT DECIMAL(10, 3),
     PRIMARY KEY(id) NOT ENFORCED
     ) WITH (
     'connector' = 'oracle-cdc',
     'hostname' = 'localhost',
     'port' = '1521',
     'username' = 'flinkuser',
     'password' = 'flinkpw',
     'database-name' = 'XE',
     'schema-name' = 'inventory',
     'table-name' = 'products');
  
-- read snapshot and binlogs from products table
Flink SQL> SELECT * FROM products;

注意: 当使用 CDB + PDB 模型时,需要在 Flink DDL 中添加一个额外的选项来指定要连接的 PDB 的名称。'debezium.database.pdb.name' = 'xxx'

连接器选项

选项必需的默认类型描述
连接器必需的(没有任何)细绳指定要使用的连接器,这里应该是'oracle-cdc'
主机名必需的(没有任何)细绳Oracle 数据库服务器的 IP 地址或主机名。
用户名必需的(没有任何)String连接到 Oracle 数据库服务器时要使用的 Oracle 数据库的名称。
密码必需的(没有任何)String连接到 Oracle 数据库服务器时使用的密码。
数据库名称必需的(没有任何)String要监视的 Oracle 服务器的数据库名称。
架构名称必需的(没有任何)String要监视的 Oracle 数据库的架构名称。
表名必需的(没有任何)String要监视的 Oracle 数据库的表名。
港口选修的1521IntegerOracle 数据库服务器的整数端口号。
扫描启动模式选修的initialStringOracle CDC 消费者的可选启动模式,有效枚举为“initial”和“latest-offset”。有关更多详细信息,请参阅启动阅读位置部分。
debezium.*选修的(没有任何)String将 Debezium 的属性传递给 Debezium 嵌入式引擎,该引擎用于从 Oracle 服务器捕获数据更改。例如:'debezium.snapshot.mode' = 'never'。查看有关Debezium 的 Oracle Connector 属性的更多信息

局限性

在扫描数据库表的快照时,由于没有可恢复的位置,我们无法执行检查点。为了不执行检查点,Oracle CDC 源将保持检查点等待超时。超时检查点将被识别为失败检查点,默认情况下,这将触发 Flink 作业的故障转移。所以如果数据库表比较大,建议在 Flink 中添加如下配置,避免超时检查点引起的 failover:

execution.checkpointing.interval: 10min
execution.checkpointing.tolerable-failed-checkpoints: 100
restart-strategy: fixed-delay
restart-strategy.fixed-delay.attempts: 2147483647

启动阅读位置

配置选项scan.startup.mode指定 Oracle CDC 消费者的启动模式。有效的枚举是:

  • initial(default): 首次启动时对监控的数据库表进行初始快照,并继续读取最新的binlog。

  • latest-offset: 永远不要在第一次启动时对监视的数据库表执行快照,只需从连接器启动以来的更改中读取。

注意:scan.startup.modeoption 的机制依赖于 Debezium 的snapshot.mode配置。所以请不要一起使用它们。如果在表 DDL 中指定了scan.startup.mode和选项,它可能会不起作用。debezium.snapshot.modescan.startup.mode

Flink CDC SQL Oracle To Kafka To ES实例:

public class CP_LOGIN_INFO {
    private static final Logger log = LoggerFactory.getLogger(CP_LOGIN_INFO.class);
    public static void main(String[] args) throws Exception {

        EnvironmentSettings fsSettings = EnvironmentSettings.newInstance() //构建环境
                .useBlinkPlanner()
                .inStreamingMode()
                .build();
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);  //设置流的并行

        StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env, fsSettings); //流表环境创造
        tableEnv.getConfig().setSqlDialect(SqlDialect.DEFAULT);



        log.info("This message contains {} placeholders. {}", 2, "Yippie");


        //  log.info("-----------------> start");  // 打印日志

//配置检查点
        env.enableCheckpointing(180000); // 开启checkpoint 每5000ms 一次
        env.getCheckpointConfig().setMinPauseBetweenCheckpoints(50000);// 确认 checkpoints 之间的时间会进行 500 ms
        env.getCheckpointConfig().setCheckpointTimeout(600000); //设置checkpoint的超时时间 即一次checkpoint必须在该时间内完成 不然就丢弃
        env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);//设置有且仅有一次模式 目前支持 EXACTLY_ONCE/AT_LEAST_ONCE
        env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);// 设置并发checkpoint的数目
        env.getCheckpointConfig().setCheckpointStorage("hdfs:///flink-checkpoints/oracle/CP_LOGIN_INFO");  // 这个是存放到hdfs目录下
        env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);// 开启在 job 中止后仍然保留的 externalized checkpoints
        env.getCheckpointConfig().enableUnalignedCheckpoints(); // 开启实验性的 unaligned checkpoints



        String sourceDDL ="CREATE TABLE Oracle_Source (\n" +
                "     ID DECIMAL(12,0), \n" +
                "     USER_CODE STRING, \n" +
                "     LOGIN_TIME STRING, \n" +
                "     OVER_TIME STRING, \n" +
                "     TOKEN STRING, \n" +
                "     INSERT_TIME_HIS STRING, \n" +
                "     UPDATE_TIME_HIS STRING, \n" +
                "     VERSION STRING, \n" +
                "     PRIMARY KEY (ID) NOT ENFORCED \n" +
                "     ) WITH (\n" +
                "     'connector' = 'oracle-cdc',\n" +
                "     'hostname' = 'Oracle_IP地址',\n" +
                "     'port' = '1521',\n" +
                "     'username' = 'userxxx',\n" +
                "     'password' = 'pwdxxx',\n" +
                "     'database-name' = 'ORCL',\n" +
                "     'schema-name' = 'Oracle的schema-name',\n" +           // 注意这里要大写
                "     'table-name' = 'CP_LOGIN_INFO',\n" +
                "     'debezium.log.mining.strategy'='online_catalog',\n" +
                "     'debezium.log.mining.batch.size.max'='30000000',\n" +
                "     'debezium.log.mining.batch.size.min'='10000',\n" +
                "     'debezium.log.mining.batch.size.default'='2000000'\n" +
                "     )";

        // 创建一张用于输出的表
        String KafkasinkDDL = "CREATE TABLE SinkKafka (\n" +
                "     ID DECIMAL(12,0), \n" +
                "     USER_CODE STRING, \n" +
                "     LOGIN_TIME BIGINT, \n" +
                "     OVER_TIME BIGINT, \n" +
                "     TOKEN STRING, \n" +
                "     INSERT_TIME_HIS BIGINT, \n" +
                "     UPDATE_TIME_HIS BIGINT, \n" +
                "     VERSION STRING, \n" +
                "     PRIMARY KEY (ID) NOT ENFORCED \n" +
                ") WITH (\n" +
                " 'connector' = 'kafka',\n" +
                " 'topic' = 'CP_LOGIN_INFO',\n" +
                " 'properties.bootstrap.servers' =  'kafka_IP_A:9092,kafka_IP_B:9092,kafka_IP_C:9092',\n" +  //一定要加且加在index的签名
                " 'properties.group.id' = 'test-consumer-group',\n" +
                " 'scan.startup.mode' = 'earliest-offset',\n" +
                " 'format' = 'debezium-json', \n" +
                " 'debezium-json.ignore-parse-errors'='true' \n" +
                ")";

        String transformSQL =
                " INSERT INTO SinkKafka  SELECT ID,\n" +
                        "USER_CODE,\n" +
                        "(CAST(LOGIN_TIME AS BIGINT) - 8 * 60 * 60 * 1000 ) as LOGIN_TIME,\n" +
                        "(CAST(OVER_TIME  AS BIGINT) - 8 * 60 * 60 * 1000 ) as OVER_TIME,\n" +
                        "TOKEN,\n" +
                        "(CAST(INSERT_TIME_HIS AS BIGINT) - 8 * 60 * 60 * 1000 ) as INSERT_TIME_HIS,\n" +
                        "(CAST(UPDATE_TIME_HIS AS BIGINT)  - 8 * 60 * 60 * 1000 ) as UPDATE_TIME_HIS,\n" +

                        "VERSION FROM  Oracle_Source " ;

        String sinkDDL = "CREATE TABLE SinkES (\n" +
                "     ID DECIMAL(12,0), \n" +
                "     USER_CODE STRING, \n" +
                "     LOGIN_TIME BIGINT, \n" +
                "     OVER_TIME BIGINT, \n" +
                "     TOKEN STRING, \n" +
                "     INSERT_TIME_HIS BIGINT, \n" +
                "     UPDATE_TIME_HIS BIGINT, \n" +
                "     VERSION STRING, \n" +
                "     PRIMARY KEY (ID) NOT ENFORCED \n" +
                ") WITH (\n" +
                " 'connector' = 'elasticsearch-7',\n" +
                " 'hosts' = 'http://ES_IP地址:9200',\n" +
                " 'format' = 'json',\n" +  //一定要加且加在index的签名
                " 'index' = 'cp_login_info_testES',\n" +
                " 'username' = 'userxxx',\n" +
                " 'password' = 'pwdxxx',\n" +
                " 'failure-handler' = 'ignore',\n" +
                " 'sink.flush-on-checkpoint' = 'true' ,\n"+
                " 'sink.bulk-flush.max-actions' = '20000' ,\n"+
                " 'sink.bulk-flush.max-size' = '2mb' ,\n"+
                " 'sink.bulk-flush.interval' = '1000ms' ,\n"+
                " 'sink.bulk-flush.backoff.strategy' = 'CONSTANT',\n"+
                " 'sink.bulk-flush.backoff.max-retries' = '3',\n"+
                " 'connection.max-retry-timeout' = '3153600000000',\n"+
                " 'sink.bulk-flush.backoff.delay' = '100ms'\n"+
                ")";

        String transformSQL1 =
                " INSERT INTO SinkES  SELECT ID,\n" +
                        "USER_CODE,\n" +
                        "LOGIN_TIME,\n" +
                        "OVER_TIME,\n" +
                        "TOKEN,\n" +
                        "INSERT_TIME_HIS,\n" +
                        "UPDATE_TIME_HIS,\n" +
                        "VERSION FROM  SinkKafka " ;

        //执行source表ddl
        tableEnv.executeSql(sourceDDL);


        //执行sink表ddl
        tableEnv.executeSql(KafkasinkDDL);
        tableEnv.executeSql(sinkDDL);

        //执行逻辑sql语句
        TableResult tableResult = tableEnv.executeSql(transformSQL);
        TableResult tableResult1 = tableEnv.executeSql(transformSQL1);
        tableResult.print();
        tableResult1.print();
        env.execute();
    }
}

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值