Flink系列--使用Flink cumulate累计窗口按照temperal join 聚合实时还款和借款以及余额总值

业务背景:

  金融机构都会存在放贷场景,预期还款情况,真实还款情况,目前可借余额情况通常采用离线同步方式查询统计获得,用户想要查看实时数据就通常采用实时大屏的方式来解决,算是一类简单的通用场景。

借还款记录表

客户主表放到oracle中,可以通过Flink CDC方式同步到kafka中,kafka中对应的数据字段如下:

 CREATE TABLE KafkaTable (
     `jiaoyije` BIGINT,   -- 交易金额  oracle:NUMBER(17)    example: 210250
     `jiaoyils` STRING,   -- 交易流水  oracle:VARCHAR2(32)  example: PL19122100000718
     `jiaoyima` STRING,   -- 交易码    oracle:VARCHAR2(10)  example: ln05   '3020' 放款 和非3020 还款
     `yeziduan` STRING,   -- 余额字段  oracle:VARCHAR2(8)   example: YSYJLIXI
     `jiaoyirq` STRING,   -- 交易日期  oracle:VARCHAR2(8)   example: 20191221
     `SHIJCHUO` STRING,   -- 时间戳    oracle:NUMBER(19)    example: 1576939296809
     WATERMARK FOR `SHIJCHUO` AS `SHIJCHUO` - INTERVAL '30' MINUTES
     ) WITH (
     'connector' = 'kafka',
     'topic' = 'klnl_dkzhmx',
     'properties.bootstrap.servers' = '10.125.221.118:9092',
     'properties.group.id' = '20221019',
     'scan.startup.mode' = 'earliest-offset',
     'format' = 'canal-json'
     )

借还款余额表

CREATE TABLE mysqlTable (
     ACTUALDAY BIGINT,     -- 实际日期  20221021
     LENDINGVALUE Double, -- 放款值    Double
     LOANVALUE Double,    -- 贷款值    Double
     BALANCEVALUE Double, -- 余额值    Double
     PRIMARY KEY (ACTUALDAY) NOT ENFORCED
     ) WITH (
     'connector' = 'jdbc',
     'url' = 'jdbc:mysql://localhost:3306/mydatabase',
     'table-name' = 'dws_calendar_now_out'
     );

满足业务场景的实现路径

通常需要通过实时同步组件通过logminer方式同步Oracle业务数据到kafka组件中

通过flink 实时作业消费kafka数据,并根据借还款余额表作为维表关联方式,在余额基础上根据实时的借还款记录对余额进行增加和减少,并按照天统计当日的借还款

模拟实现代码demo

当前因为缺少业务库数据,同时为了方便构建作业模拟客户场景,通过自定义source,sink方式实现demo来实现实时大屏效果

public static void main(String[] args) {
        EnvironmentSettings settings = EnvironmentSettings.inStreamingMode();
        TableEnvironment tEnv = TableEnvironment.create(settings);
        tEnv.getConfig().addJobParameter("parallelism.default","1");

//        DataTypes.TIMESTAMP(3)
        tEnv.executeSql("CREATE TABLE source (\n" +
                " jiaoyije BIGINT,  \n" +
                " jiaoyils STRING,  \n" +
                " jiaoyima STRING,  \n" +
                " yeziduan STRING, \n" +
                " jiaoyirq STRING,  \n" +
                " SHIJCHUO TIMESTAMP(3)," +
                " procTime AS PROCTIME()," +
                " WATERMARK FOR SHIJCHUO AS SHIJCHUO - INTERVAL '0' MINUTES\n" +
                ") WITH (\n" +
                "    'connector' = 'bank_credit'" +
                ")");

        tEnv.executeSql("CREATE TABLE dim (\n" +
                "     ACTUALDAY BIGINT,    \n" +
                "     LENDINGVALUE DOUBLE,\n" +
                "     LOANVALUE DOUBLE,\n" +
                "     BALANCEVALUE DOUBLE\n" +
                ") WITH (\n" +
                "    'connector' = 'screen'" +
                ")");


        tEnv.executeSql("CREATE TABLE print (\n" +
                "    window_start  TIMESTAMP(3),\n" +
                "    window_end  TIMESTAMP(3),\n" +
                "     ACTUALDAY_print BIGINT,    \n" +
                "     LENDINGVALUE_print DOUBLE, \n" +
                "     LOANVALUE_print DOUBLE,   \n" +
                "     BALANCEVALUE_print DOUBLE \n" +
                ") WITH (\n" +
                "    'connector' = 'print'" +
                ")");


        tEnv.executeSql("" +
                "create view temp " +
                " as " +
                " select " +
                " jiaoyije,  \n" +
                " jiaoyils,  \n" +
                " jiaoyima,  \n" +
                " yeziduan, \n" +
                " jiaoyirq,  \n" +
                " SHIJCHUO,\n" +
                " BALANCEVALUE\n" +
                " from source left join dim FOR SYSTEM_TIME AS OF source.procTime" +
                " on  cast(DATE_FORMAT(TO_TIMESTAMP_LTZ((UNIX_TIMESTAMP(jiaoyirq,'yyyyMMdd') - 86400),0),'yyyyMMdd') as INT)=dim.ACTUALDAY");


        /**
         * UNIX_TIMESTAMP(jiaoyirq,'yyyyMMdd')  把指定格式的日期转成距离1970-1-1 00:00:00 的秒数
         * UNIX_TIMESTAMP(jiaoyirq,'yyyyMMdd') - 86400  获取前一天, 次类型是BIGINT
         *
         */

        tEnv.executeSql(
                "insert into print select \n" +
                        " window_start," +
                        " window_end," +
                        " cast(jiaoyirq as INT) as ACTUALDAY_print,\n" +
                        " sum(case when jiaoyima='3020'  then  jiaoyije else 0 end)  as LENDINGVALUE_print,\n" +
                        " sum(case when jiaoyima<>'3020'  then jiaoyije else 0 end) as LOANVALUE_print,\n" +
                        " (BALANCEVALUE  + sum(case when jiaoyima='3020'  then  jiaoyije else 0 end)" +
                        " - sum(case when jiaoyima<>'3020'  then jiaoyije else 0 end)) as BALANCEVALUE_print\n" +
                        " FROM TABLE(CUMULATE(" +
                        " TABLE temp,\n" +
                        " DESCRIPTOR(SHIJCHUO),\n" +
                        " INTERVAL '10' SECOND,\n" +
                        " INTERVAL '1' DAY\n" +
                        " )) " +
                        " where yeziduan='ZHCHBJIN' \n" +
                        " and jiaoyirq = DATE_FORMAT(CURRENT_TIMESTAMP,'yyyyMMdd') " +
                        " group by jiaoyirq,window_start,window_end,BALANCEVALUE "
        );

实现效果:

      模拟预置的昨日余额是10000元,今天实时借还款记录都是10元,设计还款多于借款

      window_start,window_end,当日日期,当日还款总额,当日借款总额,实时余额

实时余额 =  10000 + 当日还款总额 - 当日借款总额

关键技术点说明

1、要求按照天统计,按照一定频率更新结果

此需要使用累计窗口cumulate,可以定时输出结果,统计值在窗口时间内一直累加

2、需要在前一天的余额上通过借还款记录对实时余额进行计算

这个需要通过lookup join的方式获取到前一天的余额值,在此基础进行计算

上述view 表通过这个方式实现

3、sql中有case when判断借还款,有各种日期处理函数都是系统函数

在 Flink SQL 中,CUMULATE 窗口函数是一种特殊的窗口,它允许你计算从一个起始点到当前时间点的数据累计值。这种窗口对于需要计算实时累计指标(如累计 PV、UV)的场景非常有用。

CUMULATE 窗口的语法如下:

CUMULATE(TABLE table_name, DESCRIPTOR(time_column), INTERVAL 'window_size', INTERVAL 'slide_interval')

  • table_name 是你的输入表。
  • time_column 是你用来定义窗口的时间列。
  • INTERVAL 'window_size' 定义了窗口的大小。
  • INTERVAL 'slide_interval' 定义了窗口滑动的时间间隔。

例如,如果你想计算每天累计到当前分钟的 PV 和 UV,你可以这样使用 CUMULATE 窗口:

SELECT
    date_format(window_start, 'yyyy-MM-dd') cal_day,
    behavior,
    count(user_id) pv,
    count(distinct user_id) uv
FROM TABLE(
    CUMULATE(TABLE user_log, DESCRIPTOR(proc_time), INTERVAL '10' SECOND, INTERVAL '1' DAY)
) GROUP BY window_start, window_end, behavior

在这个例子中,CUMULATE 窗口会根据 proc_time 列来计算每10秒钟的累计 PV 和 UV,直到当天的结束。

CUMULATE 窗口的优点是它可以输出一个追加流(append stream),这意味着每个输出都是新增的累计结果,而不会像 RETRACT 流那样有数据的撤回。这使得它非常适合用于实时仪表板和监控系统,因为它们需要连续更新数据。

附录

source表

模拟实时的借还款记录

public class BankCreditDataGenFactory implements DynamicTableSourceFactory {
    @Override
    public DynamicTableSource createDynamicTableSource(DynamicTableFactory.Context context) {
        return new BankCreditDataTableSource();
    }

    @Override
    public String factoryIdentifier() {
        return "bank_credit";
    }

    @Override
    public Set<ConfigOption<?>> requiredOptions() {
        return new HashSet<>();
    }

    @Override
    public Set<ConfigOption<?>> optionalOptions() {
        return new HashSet<>();
    }
}

public class BankCreditDataTableSource implements ScanTableSource {

    @Override
    public ChangelogMode getChangelogMode() {
        return ChangelogMode.insertOnly();
    }

    @Override
    public ScanRuntimeProvider getScanRuntimeProvider(ScanContext runtimeProviderContext) {
        return SourceFunctionProvider.of(new BankCreditDataSourceFunction(),true);
    }

    @Override
    public DynamicTableSource copy() {
        return new BankCreditDataTableSource();
    }

    @Override
    public String asSummaryString() {
        return this.getClass().getSimpleName();
    }
}

public class BankCreditDataSourceFunction implements SourceFunction {

    private final Long[] jiaoyijine = new Long[]{
            10l,10l,10l,10l,10l,10l,10l,10l,10l,10l
    };

    //todo 考虑重复消费处理
    private final String[] jiaoyiliushui = new String[]{
            "PL19122100000718", "PL19122100000718", "PL19122100000718", "PL19122100000718", "PL19122100000718"
    };

    //3020 为放款 其他为收款
    private final String[] jiaoyima = new String[]{
            "ln05", "3020", "3020", "3020", "ln05"
    };

    //余额字段 ZHCHBJIN 为收款,放款,其它不是
    private final String[] yueziduan = new String[]{
            "ZHCHBJIN", "ZHCHBJIN", "ZHCHBJIN", "ZHCHBJIN", "AAAAAAAAA"
    };

    @Override
    public void run(SourceContext ctx) throws Exception {
        /**
         " `jiaoyije` BIGINT,  \n" +
         " `jiaoyils` STRING,  \n" +
         " `jiaoyima` STRING,  \n" +
         " `yeziduan` STRING, \n" +
         "  `jiaoyirq` STRING,  \n" +
         " `SHIJCHUO` STRING,  \n" +
          */
        GenericRowData row = null;
        Random random = new Random();
        long startTime = System.currentTimeMillis();
        while (true) {
            if((System.currentTimeMillis() - startTime) >  1000) {
                startTime = System.currentTimeMillis();
                row = new GenericRowData(6);
                row.setField(0, jiaoyijine[random.nextInt(jiaoyijine.length)]);
                row.setField(1, StringData.fromString(jiaoyiliushui[random.nextInt(jiaoyiliushui.length)]));
                row.setField(2, StringData.fromString(jiaoyima[random.nextInt(jiaoyima.length)]));
                row.setField(3, StringData.fromString(yueziduan[random.nextInt(yueziduan.length)]));
                row.setField(4, StringData.fromString(LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMdd"))));
                row.setField(5, TimestampData.fromLocalDateTime(
                        LocalDateTime.now()));
                ctx.collect(row);
            }
        }
    }

    @Override
    public void cancel() {
    }
}

dim表

模拟产生借还款余额表数据

public class ScreenDataGenFactory implements DynamicTableSourceFactory {
    @Override
    public DynamicTableSource createDynamicTableSource(DynamicTableFactory.Context context) {
        return new ScreenDataTableSource();
    }

    @Override
    public String factoryIdentifier() {
        return "screen";
    }

    @Override
    public Set<ConfigOption<?>> requiredOptions() {
        return new HashSet<>();
    }

    @Override
    public Set<ConfigOption<?>> optionalOptions() {
        return new HashSet<>();
    }
}


public class ScreenDataTableSource implements LookupTableSource,ScanTableSource {

    @Override
    public DynamicTableSource copy() {
        return new ScreenDataTableSource();
    }

    @Override
    public String asSummaryString() {
        return this.getClass().getSimpleName();
    }

    @Override
    public LookupRuntimeProvider getLookupRuntimeProvider(LookupContext context) {
        return TableFunctionProvider.of(new ScreenDataSourceFunction());
    }

    @Override
    public ChangelogMode getChangelogMode() {
        return ChangelogMode.insertOnly();
    }

    @Override
    public ScanRuntimeProvider getScanRuntimeProvider(ScanContext runtimeProviderContext) {
        return null;
    }
}

public class ScreenDataSourceFunction extends TableFunction<RowData>  {

    private transient Map<Long,GenericRowData> cache;


    @Override
    public void open(FunctionContext context) throws Exception {
        super.open(context);
        /**
         *  ACTUALDAY BIGINT,     -- 实际日期
         *   LENDINGVALUE DECIMAL, -- 放款值
         *   LOANVALUE DECIMAL,    -- 贷款值
         *   BALANCEVALUE DECIMAL, -- 余额值
         */
        cache = new HashMap<>();
        Long yyyyMMddYesterday = Long.valueOf(LocalDateTime.now().minusDays(1l).format(DateTimeFormatter.ofPattern("yyyyMMdd")));
        Long yyyyMMddToday = Long.valueOf(LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMdd")));

        GenericRowData yesterday = new GenericRowData(4);
        yesterday.setField(0, yyyyMMddYesterday);
        yesterday.setField(1, 0.0d);
        yesterday.setField(2, 0.0d);
        yesterday.setField(3, 10000.0d);
        GenericRowData today = new GenericRowData(4);
        today.setField(0, yyyyMMddToday);
        today.setField(1, 0.0d);
        today.setField(2, 0.0d);
        today.setField(3, 0.0d);
        cache.put(yyyyMMddYesterday,yesterday);
        cache.put(yyyyMMddToday,today);

    }

    /**
     * This is a lookup method which is called by Flink framework in runtime.
     *
     * @param keys lookup keys
     */
    public void eval(Object... keys) {
        for (Object key : keys) {
            if(null !=key){
                GenericRowData row = cache.get(Long.valueOf(String.valueOf(key)));
                collect(row);
            }
        }
    }

}

print表

此为flink提供的DataGen 就不介绍了

maven依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>FlinkTableApiTest</artifactId>
        <groupId>org.example</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>cumulate</artifactId>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
                <dependency>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-simple</artifactId>
                </dependency>
        <dependency>
            <groupId>org.junit.jupiter</groupId>
            <artifactId>junit-jupiter-api</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-statebackend-rocksdb</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-runtime</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients</artifactId>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-runtime-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-java</artifactId>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-base</artifactId>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_2.12</artifactId>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-state-processor-api</artifactId>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-core</artifactId>
        </dependency>

    </dependencies>
</project>

SPI

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值