Flink实战(Java/MongoDB/Mysql)

Flink安装方式可参考我之前的这篇:

Flink安装(单机模式、Windows)_flink 单机-CSDN博客

一、前言

本文目标:实现分别以流处理和批处理的形式,将MongoDB中的数据处理后存入Mysql

Flink搭配简单Java程序的教程网上很常见,本文不过多赘述。

二、程序设计

1、创建Springboot项目

我用的是Java 8

server url是阿里云,Springboot官方已经不支持Java8

https://start.aliyun.com

2、依赖

    <properties>
        <java.version>1.8</java.version>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <spring-boot.version>2.6.13</spring-boot.version>
        <flink-version>1.14.6</flink-version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${flink-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.12</artifactId>
            <version>${flink-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.12</artifactId>
            <version>${flink-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-base</artifactId>
            <version>${flink-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-jdbc_2.12</artifactId>
            <version>${flink-version}</version>
        </dependency>
        <!-- lombok -->
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.30</version>
            <optional>true</optional>
        </dependency>
        <!-- mongodb -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-mongodb</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-mongodb</artifactId>
            <version>1.1.0-1.18</version>
        </dependency>

        <dependency>
            <groupId>com.alibaba.fastjson2</groupId>
            <artifactId>fastjson2</artifactId>
            <version>2.0.50</version>
        </dependency>

        <!-- Spring Boot Starter MyBatis -->
        <dependency>
            <groupId>com.baomidou</groupId>
            <artifactId>mybatis-plus-boot-starter</artifactId>
            <version>3.5.6</version>
        </dependency>

        <!-- MySQL Driver -->
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
        </dependency>
    </dependencies>

3、批处理

实现逻辑:将MongoDB中record集合中所有数据取出,将这些数据按天划分,同一天的数据会被存入同一个时间窗口(相当于当天的数据集),进而根据窗口内的数据计算每日平均、最高、最低温度,并存入Mysql的

3.1 表结构

MongoDB:

数据库名:flink_demo

集合名:record 

数据结构:

实体类

@Data
public class Record {

    /** 设备温度 */
    private Double temperature;

    private Date gatherTime;
}

Mysql

表结构

CREATE TABLE device_temperature (
    id VARCHAR(50) PRIMARY KEY,
    agv DOUBLE NOT NULL,
    max DOUBLE NOT NULL,
    min DOUBLE NOT NULL,
    gather_date VARCHAR(50)
);

实体类

@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class DeviceTemperature {

    private String id;
    /** 平均值 */
    private Double agv;
    /** 最大值 */
    private Double max;
    /** 最小值 */
    private Double min;
    /** 采集日期 格式yyyy-MM-dd */
    private String gatherDate;
}
3.2 工具类
public class CommonUtil {

    //用于生成id,没生成一个id累加一次
    private static int tag = 0;

    public static String generateId() {
        SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");
        String id = sdf.format(System.currentTimeMillis());
        tag++;
        if (tag > 9999) {
            tag = 0;
        }
        return id + String.format("%04d", tag);
    }
}
3.3 Main方法
public class BatchStatistics {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        MongoDeserializationSchema<Record> deserializationSchema = new MongoDeserializationSchema<Record>() {
            @Override
            public Record deserialize(BsonDocument document) {
                //将从mongodb中获取的数据转为Java对象,用于后续计算
                Record record = new Record();
                record.setTemperature(Double.parseDouble(document.getString("temperature").getValue()));
                record.setGatherTime(new Date(document.getDateTime("gather_time").getValue()));;
                return record;
            }

            @Override
            public TypeInformation<Record> getProducedType() {
                return TypeInformation.of(Record.class);
            }
        };
        MongoSource<Record> mongoSource = MongoSource.<Record>builder()
                .setUri("mongodb://127.0.0.1:27017")  //数据库uri
                .setDatabase("flink_demo") //数据库
                .setCollection("record") //集合
                .setFetchSize(100) //每次从 MongoDB 服务器获取多少文档
                .setLimit(3000000) //最多条数,超出数据不接收处理
                .setNoCursorTimeout(true) //如果设置为 true,MongoDB 游标将不会因长时间未被使用而超时。适用于长时间运行的查询
                .setPartitionStrategy(PartitionStrategy.SAMPLE)
                .setPartitionSize(MemorySize.parse("2GB"))  //每个分区的内存大小
                .setSamplesPerPartition(10) //每个分区中的样品数量
                .setDeserializationSchema(deserializationSchema)
                .build();
        DataStream<Record> sourceStream = env.fromSource(mongoSource, WatermarkStrategy.noWatermarks(), "MongoSource");
        sourceStream
                //把gatherTime用作时间窗口划分标准
                .assignTimestampsAndWatermarks(
                        WatermarkStrategy
                                .<Record>forMonotonousTimestamps()
                                .withTimestampAssigner((record, timestamp) -> record.getGatherTime().getTime())
                )
                //将同一天的数据划分到一个时间窗口
                .windowAll(TumblingEventTimeWindows.of(Time.days(1)))
                .process(new ProcessAllWindowFunction<Record, DeviceTemperature, TimeWindow>() {
                    @Override
                    public void process(ProcessAllWindowFunction<Record, DeviceTemperature, TimeWindow>.Context context, Iterable<Record> elements, Collector<DeviceTemperature> out) {
                        BigDecimal sum = new BigDecimal(0);
                        Double max = null;
                        Double min = null;
                        int size = 0;
                        String gatherDate = null;
                        for (Record element : elements) {
                            if (max == null || max < element.getTemperature()) {
                                max = element.getTemperature();
                            }
                            if (min == null || min > element.getTemperature()) {
                                min = element.getTemperature();
                            }

                            //用BigDecimal计算防止精度丢失
                            sum = sum.add(new BigDecimal(element.getTemperature()));
                            size++;
                            //所有同一天数据被放入同一窗口处理,取第一条数据的时间即可
                            if (gatherDate == null) {
                                SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd");
                                gatherDate = formatter.format(element.getGatherTime());
                            }
                        }
                        double doubleValue = sum.divide(new BigDecimal(size)).doubleValue();
                        DeviceTemperature result = DeviceTemperature.builder()
                                .id(CommonUtil.generateId())
                                .agv(doubleValue)
                                .max(max)
                                .min(min)
                                .gatherDate(gatherDate)
                                .build();
                        out.collect(result);
                    }
                }).addSink(JdbcSink.sink(
                        "INSERT INTO device_temperature (id, agv, max, min, gather_date) \n" +
                                "                        VALUES (?, ?, ?, ?, ?)",
                        (statement, deviceTemperature) -> {
                            // 预编译sql,准备插入mysql
                            try {
                                statement.setString(1, deviceTemperature.getId());
                                statement.setDouble(2, deviceTemperature.getAgv());
                                statement.setDouble(3, deviceTemperature.getMax());
                                statement.setDouble(4, deviceTemperature.getMin());
                                statement.setString(5, deviceTemperature.getGatherDate());
                            } catch (SQLException e) {
                                throw new RuntimeException(e);
                            }
                        },
                        JdbcExecutionOptions.builder()
                                .withBatchSize(1) // 当准备插入的数据量达到batchSize时一起执行插入,或者全部数据处理完毕(流处理时只能等数据量达到batchSize时才执行插入)
                                .build(),
                        new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
                                .withUrl("jdbc:mysql://127.0.0.1:3306/flink_demo")
                                .withDriverName("com.mysql.cj.jdbc.Driver")
                                .withUsername("root")
                                .withPassword("123456")
                                .build()
                ));
        env.execute("温度批处理");
    }

}
3.4 测试程序

MongoDB先插入一些示范数据

db.record.insertMany([
  { temperature: "22.5", gather_time: new ISODate("2024-09-16T08:00:00Z") },
  { temperature: "23.0", gather_time: new ISODate("2024-09-16T09:00:00Z") },
  { temperature: "22.8", gather_time: new ISODate("2024-09-16T10:00:00Z") },
  { temperature: "21.9", gather_time: new ISODate("2024-09-16T11:00:00Z") },
  { temperature: "22.2", gather_time: new ISODate("2024-09-16T12:00:00Z") },
  { temperature: "23.1", gather_time: new ISODate("2024-09-17T08:00:00Z") },
  { temperature: "22.7", gather_time: new ISODate("2024-09-17T09:00:00Z") },
  { temperature: "22.9", gather_time: new ISODate("2024-09-17T10:00:00Z") },
  { temperature: "21.8", gather_time: new ISODate("2024-09-17T11:00:00Z") },
  { temperature: "22.3", gather_time: new ISODate("2024-09-17T12:00:00Z") }
]);

执行mian方法 

正常结果是返回0

如果返回非0,则上滚查看 最下面 的那条 红色 异常 信息

正常结束后,Mysql数据表中应当已插入正确数据

3.5 生成jar包

build结束后,生成的jar在之前配置artifacts配置中output directory目录中,默认路径如下图

直接ctrl + shift + c复制文件目录

3.6 提交到Flink

如果上面用ctrl + shift + c复制了文件目录

下面文件名处直接 ctrl + v黏贴后直接打开即可

提交后,点击提交后显示的Job

点击submit

任务开始执行

异常信息可以在下面这里看

system.out.println等控制台输出可以在按下面步骤查看

4、流处理

实现逻辑:当任务运行时(直接运行项目或提交到Flink运行),每当MongoDB插入数据,任务就能取得这些数据,并放入存放当天数据的窗口,当数据出现第二天数据时(第一天数据收集完毕),统计计算第一天的平均、最大值、最小值温度值。

大致实现思路相同,本节只说明与批处理不一致的地方。

4.1 代码

本章3.3节 替换为下面的代码

public class MongoChangeStreamSource extends RichSourceFunction<Record> {
    private final String uri;
    private final String database;
    private final String collection;
    private volatile boolean isRunning = true;
    private MongoClient mongoClient;
    private MongoCollection<BsonDocument> mongoCollection;
    private ChangeStreamIterable<BsonDocument> changeStream;

    public MongoChangeStreamSource(String uri, String database, String collection) {
        this.uri = uri;
        this.database = database;
        this.collection = collection;
    }

    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);
        mongoClient = MongoClients.create(uri);
        MongoDatabase mongoDatabase = mongoClient.getDatabase(database);
        mongoCollection = mongoDatabase.getCollection(collection, BsonDocument.class);
        changeStream = mongoCollection.watch();
    }

    @Override
    public void run(SourceContext<Record> ctx) {
        MongoCursor<ChangeStreamDocument<BsonDocument>> cursor = changeStream.iterator();
        while (isRunning && cursor.hasNext()) {
            ChangeStreamDocument<BsonDocument> next = cursor.next();
            synchronized (ctx.getCheckpointLock()) {
                //将从mongodb中获取的数据转为Java对象,用于后续计算
                ctx.collect(deserialize(next.getFullDocument()));
            }
        }
    }

    @Override
    public void cancel() {
        isRunning = false;
    }

    private Record deserialize(BsonDocument document) {
        Record record = new Record();
        record.setTemperature(Double.parseDouble(document.getString("temperature").getValue()));
        record.setGatherTime(new Date(document.getDateTime("gather_time").getValue()));;
        return record;
    }
}
public class StreamStatistics {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        MongoChangeStreamSource mongoSource = new MongoChangeStreamSource(
                "mongodb://127.0.0.1:27017",
                "flink_demo",
                "record"
        );
        DataStream<Record> sourceStream = env.addSource(mongoSource,"MongoChangeStreamSource");
        sourceStream
                //把gatherTime用作时间窗口划分标准
                .assignTimestampsAndWatermarks(
                        WatermarkStrategy
                                .<Record>forMonotonousTimestamps()
                                .withTimestampAssigner((record, timestamp) -> record.getGatherTime().getTime())
                )
                //将同一天的数据划分到一个时间窗口
                .windowAll(TumblingEventTimeWindows.of(Time.days(1)))
                .process(new ProcessAllWindowFunction<Record, DeviceTemperature, TimeWindow>() {
                    @Override
                    public void process(ProcessAllWindowFunction<Record, DeviceTemperature, TimeWindow>.Context context, Iterable<Record> elements, Collector<DeviceTemperature> out) {
                        BigDecimal sum = new BigDecimal(0);
                        Double max = null;
                        Double min = null;
                        int size = 0;
                        String gatherDate = null;
                        for (Record element : elements) {
                            if (max == null || max < element.getTemperature()) {
                                max = element.getTemperature();
                            }
                            if (min == null || min > element.getTemperature()) {
                                min = element.getTemperature();
                            }

                            //用BigDecimal计算防止精度丢失
                            sum = sum.add(new BigDecimal(element.getTemperature()));
                            size++;
                            //所有同一天数据被放入同一窗口处理,取第一条数据的时间即可
                            if (gatherDate == null) {
                                SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd");
                                gatherDate = formatter.format(element.getGatherTime());
                            }
                        }
                        double doubleValue = sum.divide(new BigDecimal(size)).doubleValue();
                        DeviceTemperature result = DeviceTemperature.builder()
                                .id(CommonUtil.generateId())
                                .agv(doubleValue)
                                .max(max)
                                .min(min)
                                .gatherDate(gatherDate)
                                .build();
                        out.collect(result);
                    }
                }).addSink(JdbcSink.sink(
                        "INSERT INTO device_temperature (id, agv, max, min, gather_date) \n" +
                                "                        VALUES (?, ?, ?, ?, ?)",
                        (statement, deviceTemperature) -> {
                            // 预编译sql,准备插入mysql
                            try {
                                statement.setString(1, deviceTemperature.getId());
                                statement.setDouble(2, deviceTemperature.getAgv());
                                statement.setDouble(3, deviceTemperature.getMax());
                                statement.setDouble(4, deviceTemperature.getMin());
                                statement.setString(5, deviceTemperature.getGatherDate());
                            } catch (SQLException e) {
                                throw new RuntimeException(e);
                            }
                        },
                        JdbcExecutionOptions.builder()
                                .withBatchSize(1) // 当准备插入的数据量达到batchSize时一起执行插入,或者全部数据处理完毕(流处理时只能等数据量达到batchSize时才执行插入)
                                .build(),
                        new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
                                .withUrl("jdbc:mysql://127.0.0.1:3306/flink_demo")
                                .withDriverName("com.mysql.cj.jdbc.Driver")
                                .withUsername("root")
                                .withPassword("123456")
                                .build()
                ));
        env.execute("温度流处理");
    }

}
4.2 测试

当任务运行时,MongoDB插入13-14号的数据

db.record.insertMany([
  { temperature: "24.5", gather_time: new ISODate("2024-09-14T08:00:00Z") },
  { temperature: "23.5", gather_time: new ISODate("2024-09-14T09:00:00Z") },
  { temperature: "21.4", gather_time: new ISODate("2024-09-14T10:00:00Z") },
  { temperature: "22", gather_time: new ISODate("2024-09-14T11:00:00Z") },
  { temperature: "22.2", gather_time: new ISODate("2024-09-14T12:00:00Z") },
  { temperature: "23.1", gather_time: new ISODate("2024-09-13T08:00:00Z") },
  { temperature: "20", gather_time: new ISODate("2024-09-13T09:00:00Z") },
  { temperature: "27", gather_time: new ISODate("2024-09-13T10:00:00Z") },
  { temperature: "26.2", gather_time: new ISODate("2024-09-13T11:00:00Z") },
  { temperature: "24.3", gather_time: new ISODate("2024-09-13T12:00:00Z") }
]);

Mysql数据库记录13号的统计信息

继续往MongoDB插入一条15号的数据

db.record.insertMany([
  { temperature: "20", gather_time: new ISODate("2024-09-15T08:00:00Z") }
]);

Mysql数据库记录14号数据

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值