Flink SQL1.12 二 Mysql Source

4 篇文章 0 订阅
本文详细介绍了如何使用Flink进行MySQL数据源的全量缓存和CDC增量同步。在全量缓存方式中,通过Flink SQL创建JDBC连接,设置缓存参数实现数据抽取。而在增量同步更新方式中,利用Flink的mysql-cdc connector,配合MySQL的binlog实现实时数据变更捕获。文章列举了关键代码示例,并分享了实施过程中的常见问题与配置要点。
摘要由CSDN通过智能技术生成

目录

二. mysql 数据源

2.1 mysql 全量缓存的方式

2.1.1 flink sql

2.1.2 数据准备

2.1.3 sql-client

2.1.4 代码

2.1.5 采坑

2.1.6 强调配置

2.2.1flink sql

2.2.2 准备数据

2.2.3 sql client

2.2.5 采坑

2.2.6 配置


二. mysql 数据源

2.1 mysql 全量缓存的方式

2.1.1 flink sql

CREATE TABLE `triggers_mysql` (
 `trigger_id` int NOT NULL , 
`trigger_source` STRING,
 `modify_time` BIGINT NOT NULL,
 `enc_type` TINYINT 
) with ( 
'connector' = 'jdbc', 
'url' = 'jdbc:mysql://localhost:3306/azkaban', 
'table-name' = 'triggers_test', 
'username' = 'root', 'password' = '123456',
 'lookup.cache.max-rows' = '100', 
'lookup.cache.ttl' = '10s',
 'lookup.max-retries' = '3' ); 
select * from triggers_mysql;

2.1.2 数据准备

只需要一个mysql 表中有相关的 数据就行.

注意flink sql 和mysql之间的对应关系

2.1.3 sql-client

(1) jar

flink-connector-jdbc_2.11-1.12.0.jar

参考

wget https://repo.maven.apache.org/maven2/org/apache/flink/flink-connector-jdbc_2.11/1.12.0/flink-connector-jdbc_2.11-1.12.0.jar

mysql-connector-java-5.1.48.jar

flink-connector-mysql-cdc-1.1.1.jar

2.1.4 代码

(1)代码


public class FlinkMysqlAllSource implements Serializable {


    public static final String SQL = " CREATE TABLE " +
            " `triggers_mysql` ( \n" +
            " `trigger_id` int NOT NULL ,\n" +
            " `trigger_source` STRING,\n" +
            " `modify_time` BIGINT NOT NULL,\n" +
            " `enc_type` TINYINT\n" +
            "  ) with ( " +
            "  'connector' = 'jdbc' ," +
            "  'url' = 'jdbc:mysql://localhost:3306/azkaban', " +
            "  'table-name' = 'triggers_test'," +
            "  'username' = 'root', " +
            "  'password' = '123456', " +
            "  'lookup.cache.max-rows' = '100', " +
            "  'lookup.cache.ttl' = '10s', " +
            "  'lookup.max-retries' = '3' ) ";


    public static void main(String[] args) {

        //构建StreamExecutionEnvironment
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //构建EnvironmentSettings 并指定Blink Planner
        EnvironmentSettings bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();

        //构建StreamTableEnvironment
        StreamTableEnvironment tEnv = StreamTableEnvironment.create(env, bsSettings);

        tEnv.executeSql(SQL);

        //执行查询
        Table table = tEnv.sqlQuery("select * from triggers_mysql ");

        //转回DataStream并输出
        DataStream<Row> rowDataStream = tEnv.toAppendStream(table, Row.class);
        rowDataStream.map(new MapFunction<Row, Row>() {
            @Override
            public Row map(Row row) throws Exception {
                System.out.println(row);
                return row;
            }
        }).print("sql==").setParallelism(1);
        //任务启动,这行必不可少!
        try {
            env.execute("test");
        } catch (Exception e) {
            e.printStackTrace();
        }

    }

(2) pom

<dependency>   
<groupId>org.apache.flink</groupId> 
  <artifactId>flink-connector-jdbc_2.11</artifactId>
   <version>1.12.0</version>
 </dependency>

2.1.5 采坑

(1)查的时候 注意俩个参数的使用

'lookup.cache.ttl' = '10s',

'lookup.max-retries' = '3' ); 意思是攒满多少条才触发写入

(2)写的时候注意俩个参数的使用

sink.buffer-flush.max-rows

sink.buffer-flush.interval

2.1.6 强调配置

sink.buffer-flush.max-rows -- 意思是攒满多少条才触发写入

sink.buffer-flush.interval --攒满多少秒才触发写入;

俩参数同时判断 sink.max-retries-- 如果将记录写入数据库失败,则最大重试时间。

lookup.max-retries -- 查找数据库失败时的最大重试时间。

lookup.cache.ttl ---每一行的最长生存时间

lookup.cache.max-rows ---缓存的最大行数 scan.fetch-size

-- 每次去数据库抓数据的行数 scan.auto-commit --JDBC驱动程序上设置自动提交 其他的都能看明白,我就不解释了

2.2.1flink sql

CREATE TABLE `triggers_cdc` (
 `trigger_id` INT , 
`trigger_source` STRING NULL,
 `modify_time` BIGINT NOT NULL, 
`enc_type` TINYINT NULL, 
PRIMARY KEY (`trigger_id`) NOT ENFORCED
 ) with ( 
'connector' = 'mysql-cdc',
 'hostname' = 'localhost', 
'port' = '3306', 
'username' = 'root',
 'password' = '123456', 
'database-name' = 'azkaban', 
'table-name' = 'triggers_test' );

2.2.2 准备数据

这里,只需要有mysql 环境,并且开启binlog 的配置,至于怎么mysql 开启binlog ,请网上查阅

2.2.3 sql client

(1)jar

wget

https://repo1.maven.org/maven2/com/alibaba/ververica/flink-connector-mysql-cdc/1.1.1/flink-connector-mysql-cdc-1.1.1.jar

2.2.4 代码

(1)代码


public class FlinkMysqlCdcSource implements Serializable {


    public static final String SQL = " CREATE TABLE `triggers_cdc` ( \n" +
            "`trigger_id` INT ,\n" +
            "`trigger_source` STRING  NULL,\n" +
            "`modify_time` BIGINT NOT NULL,\n" +
            "`enc_type` TINYINT NULL,\n" +
            "PRIMARY KEY (`trigger_id`) NOT ENFORCED\n" +
            " ) with ( \n" +
            "  'connector' = 'mysql-cdc',\n" +
            "   'hostname' = 'localhost',\n" +
            "   'port' = '3306',\n" +
            "  'username' = 'root',\n" +
            "  'password' = '123456',\n" +
            "   'database-name' = 'azkaban',\n" +
            "   'table-name' = 'triggers_test'\n" +
            ") ";


    public static void main(String[] args) {

        //构建StreamExecutionEnvironment
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //构建EnvironmentSettings 并指定Blink Planner
        EnvironmentSettings bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();

        //构建StreamTableEnvironment
        StreamTableEnvironment tEnv = StreamTableEnvironment.create(env, bsSettings);

        tEnv.executeSql(SQL);

        //执行查询
        Table table = tEnv.sqlQuery("select * from triggers_cdc ");

        DataStream<Tuple2<Boolean, Row>> dataStream = tEnv.toRetractStream(table, Row.class);
        dataStream.map(new MapFunction<Tuple2<Boolean, Row>, Row>() {
            @Override
            public Row map(Tuple2<Boolean, Row> booleanRowTuple2) throws Exception {
               System.out.println(booleanRowTuple2.f0 + "--" + booleanRowTuple2.f1);
                if (booleanRowTuple2.f0) {
                    return booleanRowTuple2.f1;
                } else {

                }
                return null;
            }
        }).filter(new FilterFunction<Row>() {
            @Override
            public boolean filter(Row row) throws Exception {
                return row != null;
            }
        }).print("triggers_cdc ").setParallelism(1);

        //任务启动,这行必不可少!
        try {
            env.execute("test");
        } catch (Exception e) {
            e.printStackTrace();
        }

    }

(2)pom

<dependency>
    <groupId>com.alibaba.ververica</groupId>
    <artifactId>flink-connector-mysql-cdc</artifactId>
    <version>1.1.1</version>
</dependency>
如果要使用Kafka的changelog-json Format,对于程序而言,需要添加如下依赖: 
<dependency>
  <groupId>com.alibaba.ververica</groupId>
  <artifactId>flink-format-changelog-json</artifactId>
  <version>1.0.0</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-kafka_2.11</artifactId>
    <version>1.11.0</version>
</dependency>

2.2.5 采坑

(1)

一定要将相关的jar 放在flink/lib下,放在其他的目录可能会出现加载不完善的情况,

2.2.6 配置

配置:

connector database-name hostname password port property-version-- 可选

server-id -- 可选 mysql里面提供的配置

server-time-zone --- 可选 时区 table-name username


 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值