Flink安装方式可参考我之前的这篇:
Flink安装(单机模式、Windows)_flink 单机-CSDN博客
一、前言
本文目标:实现分别以流处理和批处理的形式,将MongoDB中的数据处理后存入Mysql
Flink搭配简单Java程序的教程网上很常见,本文不过多赘述。
二、程序设计
1、创建Springboot项目
我用的是Java 8
server url是阿里云,Springboot官方已经不支持Java8
https://start.aliyun.com
2、依赖
<properties>
<java.version>1.8</java.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<spring-boot.version>2.6.13</spring-boot.version>
<flink-version>1.14.6</flink-version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink-version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.12</artifactId>
<version>${flink-version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.12</artifactId>
<version>${flink-version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-base</artifactId>
<version>${flink-version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-jdbc_2.12</artifactId>
<version>${flink-version}</version>
</dependency>
<!-- lombok -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.30</version>
<optional>true</optional>
</dependency>
<!-- mongodb -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-mongodb</artifactId>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-mongodb</artifactId>
<version>1.1.0-1.18</version>
</dependency>
<dependency>
<groupId>com.alibaba.fastjson2</groupId>
<artifactId>fastjson2</artifactId>
<version>2.0.50</version>
</dependency>
<!-- Spring Boot Starter MyBatis -->
<dependency>
<groupId>com.baomidou</groupId>
<artifactId>mybatis-plus-boot-starter</artifactId>
<version>3.5.6</version>
</dependency>
<!-- MySQL Driver -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
</dependency>
</dependencies>
3、批处理
实现逻辑:将MongoDB中record集合中所有数据取出,将这些数据按天划分,同一天的数据会被存入同一个时间窗口(相当于当天的数据集),进而根据窗口内的数据计算每日平均、最高、最低温度,并存入Mysql的
3.1 表结构
MongoDB:
数据库名:flink_demo
集合名:record
数据结构:
实体类
@Data
public class Record {
/** 设备温度 */
private Double temperature;
private Date gatherTime;
}
Mysql
表结构
CREATE TABLE device_temperature (
id VARCHAR(50) PRIMARY KEY,
agv DOUBLE NOT NULL,
max DOUBLE NOT NULL,
min DOUBLE NOT NULL,
gather_date VARCHAR(50)
);
实体类
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class DeviceTemperature {
private String id;
/** 平均值 */
private Double agv;
/** 最大值 */
private Double max;
/** 最小值 */
private Double min;
/** 采集日期 格式yyyy-MM-dd */
private String gatherDate;
}
3.2 工具类
public class CommonUtil {
//用于生成id,没生成一个id累加一次
private static int tag = 0;
public static String generateId() {
SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");
String id = sdf.format(System.currentTimeMillis());
tag++;
if (tag > 9999) {
tag = 0;
}
return id + String.format("%04d", tag);
}
}
3.3 Main方法
public class BatchStatistics {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
MongoDeserializationSchema<Record> deserializationSchema = new MongoDeserializationSchema<Record>() {
@Override
public Record deserialize(BsonDocument document) {
//将从mongodb中获取的数据转为Java对象,用于后续计算
Record record = new Record();
record.setTemperature(Double.parseDouble(document.getString("temperature").getValue()));
record.setGatherTime(new Date(document.getDateTime("gather_time").getValue()));;
return record;
}
@Override
public TypeInformation<Record> getProducedType() {
return TypeInformation.of(Record.class);
}
};
MongoSource<Record> mongoSource = MongoSource.<Record>builder()
.setUri("mongodb://127.0.0.1:27017") //数据库uri
.setDatabase("flink_demo") //数据库
.setCollection("record") //集合
.setFetchSize(100) //每次从 MongoDB 服务器获取多少文档
.setLimit(3000000) //最多条数,超出数据不接收处理
.setNoCursorTimeout(true) //如果设置为 true,MongoDB 游标将不会因长时间未被使用而超时。适用于长时间运行的查询
.setPartitionStrategy(PartitionStrategy.SAMPLE)
.setPartitionSize(MemorySize.parse("2GB")) //每个分区的内存大小
.setSamplesPerPartition(10) //每个分区中的样品数量
.setDeserializationSchema(deserializationSchema)
.build();
DataStream<Record> sourceStream = env.fromSource(mongoSource, WatermarkStrategy.noWatermarks(), "MongoSource");
sourceStream
//把gatherTime用作时间窗口划分标准
.assignTimestampsAndWatermarks(
WatermarkStrategy
.<Record>forMonotonousTimestamps()
.withTimestampAssigner((record, timestamp) -> record.getGatherTime().getTime())
)
//将同一天的数据划分到一个时间窗口
.windowAll(TumblingEventTimeWindows.of(Time.days(1)))
.process(new ProcessAllWindowFunction<Record, DeviceTemperature, TimeWindow>() {
@Override
public void process(ProcessAllWindowFunction<Record, DeviceTemperature, TimeWindow>.Context context, Iterable<Record> elements, Collector<DeviceTemperature> out) {
BigDecimal sum = new BigDecimal(0);
Double max = null;
Double min = null;
int size = 0;
String gatherDate = null;
for (Record element : elements) {
if (max == null || max < element.getTemperature()) {
max = element.getTemperature();
}
if (min == null || min > element.getTemperature()) {
min = element.getTemperature();
}
//用BigDecimal计算防止精度丢失
sum = sum.add(new BigDecimal(element.getTemperature()));
size++;
//所有同一天数据被放入同一窗口处理,取第一条数据的时间即可
if (gatherDate == null) {
SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd");
gatherDate = formatter.format(element.getGatherTime());
}
}
double doubleValue = sum.divide(new BigDecimal(size)).doubleValue();
DeviceTemperature result = DeviceTemperature.builder()
.id(CommonUtil.generateId())
.agv(doubleValue)
.max(max)
.min(min)
.gatherDate(gatherDate)
.build();
out.collect(result);
}
}).addSink(JdbcSink.sink(
"INSERT INTO device_temperature (id, agv, max, min, gather_date) \n" +
" VALUES (?, ?, ?, ?, ?)",
(statement, deviceTemperature) -> {
// 预编译sql,准备插入mysql
try {
statement.setString(1, deviceTemperature.getId());
statement.setDouble(2, deviceTemperature.getAgv());
statement.setDouble(3, deviceTemperature.getMax());
statement.setDouble(4, deviceTemperature.getMin());
statement.setString(5, deviceTemperature.getGatherDate());
} catch (SQLException e) {
throw new RuntimeException(e);
}
},
JdbcExecutionOptions.builder()
.withBatchSize(1) // 当准备插入的数据量达到batchSize时一起执行插入,或者全部数据处理完毕(流处理时只能等数据量达到batchSize时才执行插入)
.build(),
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl("jdbc:mysql://127.0.0.1:3306/flink_demo")
.withDriverName("com.mysql.cj.jdbc.Driver")
.withUsername("root")
.withPassword("123456")
.build()
));
env.execute("温度批处理");
}
}
3.4 测试程序
MongoDB先插入一些示范数据
db.record.insertMany([
{ temperature: "22.5", gather_time: new ISODate("2024-09-16T08:00:00Z") },
{ temperature: "23.0", gather_time: new ISODate("2024-09-16T09:00:00Z") },
{ temperature: "22.8", gather_time: new ISODate("2024-09-16T10:00:00Z") },
{ temperature: "21.9", gather_time: new ISODate("2024-09-16T11:00:00Z") },
{ temperature: "22.2", gather_time: new ISODate("2024-09-16T12:00:00Z") },
{ temperature: "23.1", gather_time: new ISODate("2024-09-17T08:00:00Z") },
{ temperature: "22.7", gather_time: new ISODate("2024-09-17T09:00:00Z") },
{ temperature: "22.9", gather_time: new ISODate("2024-09-17T10:00:00Z") },
{ temperature: "21.8", gather_time: new ISODate("2024-09-17T11:00:00Z") },
{ temperature: "22.3", gather_time: new ISODate("2024-09-17T12:00:00Z") }
]);
执行mian方法
正常结果是返回0
如果返回非0,则上滚查看 最下面 的那条 红色 异常 信息
正常结束后,Mysql数据表中应当已插入正确数据
3.5 生成jar包
build结束后,生成的jar在之前配置artifacts配置中output directory目录中,默认路径如下图
直接ctrl + shift + c复制文件目录
3.6 提交到Flink
如果上面用ctrl + shift + c复制了文件目录
下面文件名处直接 ctrl + v黏贴后直接打开即可
提交后,点击提交后显示的Job
点击submit
任务开始执行
异常信息可以在下面这里看
system.out.println等控制台输出可以在按下面步骤查看
4、流处理
实现逻辑:当任务运行时(直接运行项目或提交到Flink运行),每当MongoDB插入数据,任务就能取得这些数据,并放入存放当天数据的窗口,当数据出现第二天数据时(第一天数据收集完毕),统计计算第一天的平均、最大值、最小值温度值。
大致实现思路相同,本节只说明与批处理不一致的地方。
4.1 代码
本章3.3节 替换为下面的代码
public class MongoChangeStreamSource extends RichSourceFunction<Record> {
private final String uri;
private final String database;
private final String collection;
private volatile boolean isRunning = true;
private MongoClient mongoClient;
private MongoCollection<BsonDocument> mongoCollection;
private ChangeStreamIterable<BsonDocument> changeStream;
public MongoChangeStreamSource(String uri, String database, String collection) {
this.uri = uri;
this.database = database;
this.collection = collection;
}
@Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
mongoClient = MongoClients.create(uri);
MongoDatabase mongoDatabase = mongoClient.getDatabase(database);
mongoCollection = mongoDatabase.getCollection(collection, BsonDocument.class);
changeStream = mongoCollection.watch();
}
@Override
public void run(SourceContext<Record> ctx) {
MongoCursor<ChangeStreamDocument<BsonDocument>> cursor = changeStream.iterator();
while (isRunning && cursor.hasNext()) {
ChangeStreamDocument<BsonDocument> next = cursor.next();
synchronized (ctx.getCheckpointLock()) {
//将从mongodb中获取的数据转为Java对象,用于后续计算
ctx.collect(deserialize(next.getFullDocument()));
}
}
}
@Override
public void cancel() {
isRunning = false;
}
private Record deserialize(BsonDocument document) {
Record record = new Record();
record.setTemperature(Double.parseDouble(document.getString("temperature").getValue()));
record.setGatherTime(new Date(document.getDateTime("gather_time").getValue()));;
return record;
}
}
public class StreamStatistics {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
MongoChangeStreamSource mongoSource = new MongoChangeStreamSource(
"mongodb://127.0.0.1:27017",
"flink_demo",
"record"
);
DataStream<Record> sourceStream = env.addSource(mongoSource,"MongoChangeStreamSource");
sourceStream
//把gatherTime用作时间窗口划分标准
.assignTimestampsAndWatermarks(
WatermarkStrategy
.<Record>forMonotonousTimestamps()
.withTimestampAssigner((record, timestamp) -> record.getGatherTime().getTime())
)
//将同一天的数据划分到一个时间窗口
.windowAll(TumblingEventTimeWindows.of(Time.days(1)))
.process(new ProcessAllWindowFunction<Record, DeviceTemperature, TimeWindow>() {
@Override
public void process(ProcessAllWindowFunction<Record, DeviceTemperature, TimeWindow>.Context context, Iterable<Record> elements, Collector<DeviceTemperature> out) {
BigDecimal sum = new BigDecimal(0);
Double max = null;
Double min = null;
int size = 0;
String gatherDate = null;
for (Record element : elements) {
if (max == null || max < element.getTemperature()) {
max = element.getTemperature();
}
if (min == null || min > element.getTemperature()) {
min = element.getTemperature();
}
//用BigDecimal计算防止精度丢失
sum = sum.add(new BigDecimal(element.getTemperature()));
size++;
//所有同一天数据被放入同一窗口处理,取第一条数据的时间即可
if (gatherDate == null) {
SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd");
gatherDate = formatter.format(element.getGatherTime());
}
}
double doubleValue = sum.divide(new BigDecimal(size)).doubleValue();
DeviceTemperature result = DeviceTemperature.builder()
.id(CommonUtil.generateId())
.agv(doubleValue)
.max(max)
.min(min)
.gatherDate(gatherDate)
.build();
out.collect(result);
}
}).addSink(JdbcSink.sink(
"INSERT INTO device_temperature (id, agv, max, min, gather_date) \n" +
" VALUES (?, ?, ?, ?, ?)",
(statement, deviceTemperature) -> {
// 预编译sql,准备插入mysql
try {
statement.setString(1, deviceTemperature.getId());
statement.setDouble(2, deviceTemperature.getAgv());
statement.setDouble(3, deviceTemperature.getMax());
statement.setDouble(4, deviceTemperature.getMin());
statement.setString(5, deviceTemperature.getGatherDate());
} catch (SQLException e) {
throw new RuntimeException(e);
}
},
JdbcExecutionOptions.builder()
.withBatchSize(1) // 当准备插入的数据量达到batchSize时一起执行插入,或者全部数据处理完毕(流处理时只能等数据量达到batchSize时才执行插入)
.build(),
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl("jdbc:mysql://127.0.0.1:3306/flink_demo")
.withDriverName("com.mysql.cj.jdbc.Driver")
.withUsername("root")
.withPassword("123456")
.build()
));
env.execute("温度流处理");
}
}
4.2 测试
当任务运行时,MongoDB插入13-14号的数据
db.record.insertMany([
{ temperature: "24.5", gather_time: new ISODate("2024-09-14T08:00:00Z") },
{ temperature: "23.5", gather_time: new ISODate("2024-09-14T09:00:00Z") },
{ temperature: "21.4", gather_time: new ISODate("2024-09-14T10:00:00Z") },
{ temperature: "22", gather_time: new ISODate("2024-09-14T11:00:00Z") },
{ temperature: "22.2", gather_time: new ISODate("2024-09-14T12:00:00Z") },
{ temperature: "23.1", gather_time: new ISODate("2024-09-13T08:00:00Z") },
{ temperature: "20", gather_time: new ISODate("2024-09-13T09:00:00Z") },
{ temperature: "27", gather_time: new ISODate("2024-09-13T10:00:00Z") },
{ temperature: "26.2", gather_time: new ISODate("2024-09-13T11:00:00Z") },
{ temperature: "24.3", gather_time: new ISODate("2024-09-13T12:00:00Z") }
]);
Mysql数据库记录13号的统计信息
继续往MongoDB插入一条15号的数据
db.record.insertMany([
{ temperature: "20", gather_time: new ISODate("2024-09-15T08:00:00Z") }
]);
Mysql数据库记录14号数据