Flink sql实例问题记录

Flink sql

Flink sql

消费Kafka数据

先创建流环境和表环境

 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);
        env.setParallelism(1);
 //设置检查点
        env.enableCheckpointing(5000L);

先通过连接器建表

字段顺序可以不一致,但是字段名和属性要与kafka内的字段属性一致,要能匹配上,这里默认的是按照处理时间来操作,用来触发窗口,若使用处理时间用函数

`ts` AS PROCTIME()
    //这种方式直接可以用来触发窗口,无需定义这个字段为水位线,如果是按照时间时间,或者数据中有时间戳,则需要定义水位线 格式如下  以下是乱序 有延迟 ts代表的就是指定的水位线的字段
    watermark for ts as ts - interval '5' second
tEnv.executeSql(
                    "CREATE TABLE kafka_table (" +
                            "`action_type` VARCHAR,\n" +
                            "`allocated_qty` DOUBLE,\n" +
                            "`batch_no` VARCHAR,\n" +
                            "`brand_code` VARCHAR,\n" +
                            "`brand_name` VARCHAR,\n" +
                            "`client_code` VARCHAR,\n" +
                            "`client_name` VARCHAR,\n" +
                            "`container_code` VARCHAR,\n" +
                            "`country_of_origin` VARCHAR,\n" +
                            "`created_by_user` VARCHAR,\n" +
                            "`created_by_user_id` BIGINT,\n" +
                            "`created_dtm_loc` TIMESTAMP,\n" +
                            "`created_office` VARCHAR,\n" +
                            "`created_time_zone` VARCHAR,\n" +
                            "`csNum` DOUBLE,\n" +
                            "`db` VARCHAR,\n" +
                            "`eventTimeDay` DATE,\n" +
                            "`exp_date` DATE,\n" +
                            "`from_location_code` VARCHAR,\n" +
                            "`frozen_qty` DOUBLE,\n" +
                            "`in_transit_qty` DOUBLE,\n" +
                            "`inv_adjustment_qty` DOUBLE,\n" +
                            "`inv_adjustment_type` VARCHAR,\n" +
                            "`inv_location_inventory_id` BIGINT,\n" +
                            "`inv_lot_id` BIGINT,\n" +
                            "`inv_transaction_id` BIGINT,\n" +
                            "`inventory_quality` VARCHAR,\n" +
                            "`inventory_type` VARCHAR,\n" +
                            "`item_code` VARCHAR,\n" +
                            "`item_name` VARCHAR,\n" +
                            "`location_code` VARCHAR,\n" +
                            "`locked` TINYINT,\n" +
                            "`lot_attr1` VARCHAR,\n" +
                            "`lot_attr10` VARCHAR,\n" +
                            "`lot_attr11` VARCHAR,\n" +
                            "`lot_attr12` VARCHAR,\n" +
                            "`lot_attr13` VARCHAR,\n" +
                            "`lot_attr14` VARCHAR,\n" +
                            "`lot_attr15` VARCHAR,\n" +
                            "`lot_attr16` DATE,\n" +
                            "`lot_attr2` VARCHAR,\n" +
                            "`lot_attr3` VARCHAR,\n" +
                            "`lot_attr4` VARCHAR,\n" +
                            "`lot_attr5` VARCHAR,\n" +
                            "`lot_attr6` VARCHAR,\n" +
                            "`lot_attr7` VARCHAR,\n" +
                            "`lot_attr8` VARCHAR,\n" +
                            "`lot_attr9` VARCHAR,\n" +
                            "`lpn_no` VARCHAR,\n" +
                            "`mfg_date` DATE,\n" +
                            "`on_hand_qty` DOUBLE,\n" +
                            "`op` VARCHAR,\n" +
                            "`owner_code` VARCHAR,\n" +
                            "`pack_code` VARCHAR,\n" +
                            "`palletNum` DOUBLE,\n" +
                            "`po_no` VARCHAR,\n" +
                            "`primaryKey` VARCHAR,\n" +
                            "`priority` SMALLINT,\n" +
                            "`process_state` SMALLINT,\n" +
                            "`qty_uom` VARCHAR,\n" +
                            "`received_date` DATE,\n" +
                            "`record_version` BIGINT,\n" +
                            "`ref_code_id_1` VARCHAR,\n" +
                            "`ref_code_id_10` VARCHAR,\n" +
                            "`ref_code_id_2` VARCHAR,\n" +
                            "`ref_code_id_3` VARCHAR,\n" +
                            "`ref_code_id_4` VARCHAR,\n" +
                            "`ref_code_id_5` VARCHAR,\n" +
                            "`ref_code_id_6` VARCHAR,\n" +
                            "`ref_code_id_7` VARCHAR,\n" +
                            "`ref_code_id_8` VARCHAR,\n" +
                            "`ref_code_id_9` VARCHAR,\n" +
                            "`ref_value_1` VARCHAR,\n" +
                            "`ref_value_10` VARCHAR,\n" +
                            "`ref_value_2` VARCHAR,\n" +
                            "`ref_value_3` VARCHAR,\n" +
                            "`ref_value_4` VARCHAR,\n" +
                            "`ref_value_5` VARCHAR,\n" +
                            "`ref_value_6` VARCHAR,\n" +
                            "`ref_value_7` VARCHAR,\n" +
                            "`ref_value_8` VARCHAR,\n" +
                            "`ref_value_9` VARCHAR,\n" +
                            "`reference_line_no` VARCHAR,\n" +
                            "`reference_no` VARCHAR,\n" +
                            "`reference_type` VARCHAR,\n" +
                            "`sales_no` VARCHAR,\n" +
                            "`sales_status` VARCHAR,\n" +
                            "`table` VARCHAR,\n" +
                            "`total_gross_weight` DOUBLE,\n" +
                            "`total_net_weight` DOUBLE,\n" +
                            "`total_weight` DOUBLE,\n" +
                            "`transaction_no` VARCHAR,\n" +
                            "`updated_by_user` VARCHAR,\n" +
                            "`updated_by_user_id` BIGINT,\n" +
                            "`updated_dtm_loc` TIMESTAMP,\n" +
                            "`updated_office` VARCHAR,\n" +
                            "`updated_time_zone` VARCHAR,\n" +
                            "`vendor_code` VARCHAR,\n" +
                            "`vendor_name` VARCHAR,\n" +
                            "`volume` DOUBLE,\n" +
                            "`warehouse_code` VARCHAR,\n" +
                            "`warehouse_id` BIGINT,\n" +
//                            "`process_time` AS PROCTIME(),\n" +
//                            "`ts` as LOCALTIMESTAMP,\n" +
                            "`ts` AS PROCTIME()\n" +
//                            " watermark for ts as ts - interval '5' second\n" +
                            ")WITH (\n" +
                            "'connector' = 'kafka',\n" +
                            "'topic' = 'dwd_inv_transaction',\n" +
                            "'properties.bootstrap.servers' = 'shucang001:9092,shucang002:9092,shucang003:9092',\n" +
                            //ads_test 的消费者组
//                            "'properties.group.id' = 'testGroup',\n" +
                            //ads_test01的消费者组
                            "'properties.group.id' = 'testGroup05',\n" +
                            //无提交offet的 从最开始消费
                            "'scan.startup.mode' = 'earliest-offset',\n" +
                            //从已提交的offset开始消费 无提交的从最新的开始消费
//                            "'scan.startup.mode' = 'latest-offset',\n" +
                            "'format' = 'json'\n" +
                            ")"
        );

注意:
group-offsets:从特定消费者组的 ZK / Kafka 代理中的已提交偏移开始。
earliest-offset:从可能的最早偏移量开始。
latest-offset : 从最新的偏移量开始。
timestamp:从用户提供的每个分区的时间戳开始。
specific-offsets:从用户提供的每个分区的特定偏移量开始

执行group by

String adsSql = "select \n" +
                "eventTimeDay,\n" +
                "warehouse_code,\n" +
                "client_code,\n" +
                "client_name, \n" +
                "reference_type,\n" +
                "sum(ea_num)  ea_num,\n" +
                "sum(pallet_num) pallet_num, \n" +
                "sum(cs_num) cs_num,\n" +
                "ROUND(sum(total_weight),3) total_weight,\n" +
                "ROUND(sum(total_gross_weight),3) total_gross_weight,\n" +
                "ROUND(sum(total_net_weight),3) total_net_weight\n" +
                "from \n" +
                "(\n" +
                "SELECT\n" +
                "eventTimeDay,\n" +
                "warehouse_code,\n" +
                "client_code,\n" +
                "client_name,\n" +
                "lpn_no,\n" +
                "reference_type,\n" +
                "SUM(inv_adjustment_qty) as ea_num,\n" +
                "CEILING(SUM(palletNum)) as pallet_num ,\n" +
                "CEILING(SUM(csNum)) as cs_num,\n" +
                "ROUND(SUM(total_weight),3) as total_weight,\n" +
                "ROUND(SUM(total_gross_weight),3) as total_gross_weight,\n" +
                "ROUND(SUM(total_net_weight),3) as total_net_weight\n" +
                "FROM\n" +
                "kafka_table\n" +
                "GROUP BY\n" +
                "TUMBLE(ts, INTERVAL '25' SECOND),\n" +
                "eventTimeDay,\n" +
                "warehouse_code,\n" +
                "client_code,\n" +
                "client_name,\n" +
                "lpn_no,\n" +
                "reference_type\n" +
                ") b where ea_num>0\n" +
                "group by \n" +
                "eventTimeDay,\n" +
                "warehouse_code,\n" +
                "client_code,\n" +
                "client_name,\n" +
                "reference_type";
        Table adstable = tEnv.sqlQuery(adsSql);
        tEnv.toChangelogStream(adstable).print();

需要在group by 后设置窗口的类型和触发间隔

TUMBLE(ts, INTERVAL '25' SECOND)
这里代表的是滚动窗口,触发间隔是25秒一次

sink到mysql数据库

这个表需要先创建好

String mysql = "create table ads_table(\n" +
                "  `id` STRING,\n" +
                "  `event_time_day` DATE,\n" +
                "  `warehouse_code` STRING,\n" +
                "  `client_code` STRING ,\n" +
                "  `client_name` STRING ,\n" +
                "  `reference_type` STRING,\n" +
                "  `ea_num` DOUBLE,\n" +
                "  `pallet_num` DOUBLE,\n" +
                "  `cs_num` DOUBLE,\n" +
                "  `total_weight` DOUBLE,\n" +
                "  `total_gross_weight` DOUBLE,\n" +
                "  `total_net_weight` DOUBLE,\n" +
                "   PRIMARY KEY (id) NOT ENFORCED" +
                "  ) WITH (\n" +
                "\t'connector' = 'jdbc',\n" +
                "  'driver'='com.mysql.cj.jdbc.Driver',\n" +
                "  'username' = 'root',  \n" +
                "  'password' = '123456',  \n" +
                "  'table-name' = 'ads_test01',\n" +
//                "  'table-name' = 'ads_test03',\n" +
//                "  'url' = 'jdbc:mysql://localhost:3306/test?useSSL=true&useUnicode=true&characterEncoding=utf-8&serverTimezone=Asia/Shanghai'\n" +
                "  'url' = 'jdbc:mysql://40.73.64.201:50010/test?useSSL=true&useUnicode=true&characterEncoding=utf-8&serverTimezone=Asia/Shanghai'\n" +
                "  )";

注意此处的driver用的是高版本的,因为上面cdc用的是2.2.1,这个要求就是需要使用高版本的驱动,但是不影响来用它连接mysql5的数据库

接下来就是创建表,并往里面插入数据,这个时候会遇到primarykey的问题,我这里参考的就是通过groupby的字段来拼接一个唯一字段,往数据库里面插入数据,因为这里用了groupby,写入的时候会触发upset操作,需要通过一个主键来找到对应的行进行更新操作

 tEnv.executeSql(mysql);
        tEnv.executeSql("insert into ads_table SELECT CONCAT_WS('-',CAST(eventTimeDay AS STRING),client_code,reference_type) as id,eventTimeDay,warehouse_code,client_code,client_name,reference_type,ea_num,pallet_num,cs_num,total_weight,total_gross_weight,total_net_weight FROM "+ adstable);

问题

如果分组字段过多,这样操作可能会出现拼接的id长度过长被截断,导致id不准确,最后数据出现问题,现在想是否有一种方式能基于拼接的字符串生产一个唯一的类似数字id 将这个id写入到数据库中就能避免长度过长的情况出现,还有一个方式就是通过api去实现,手写jdbc 但是代码量就上升了,复用性也不算很高
解决方案一:
设置mysql数据库联合主键,还是只拼接2个字段,剩下字段通过设置联合主键的方式,来确定需要修改的数据位置。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值