Flink入门和学习总结
Flink入门和学习总结
第一次了解Flink是通过《Flink原理、实战与性能优化》这本书,他让我对Flink有了最初的印象和认知。在一次项目报表功能开发中让我第一次深入学习并使用它。这篇博客是对我学习的过程的记录也算是个总结吧,希望在以后能不断的深化理解和学习,在以后的工作中提供更多的思路和解决方案。
Flink简介
Apache Flink是一个面向数据流处理和批量数据处理的可分布式的开源计算框架,它基于同一个Flink流式执行模型(streaming execution model),能够支持 流处理 和 批处理 两种应用类型。由于流处理和批处理所提供的SLA(服务等级协议)是完全不相同, 流处理一般需要支持低延迟、Exactly-once保证,而批处理需要支持高吞吐、高效处理,所以在实现的时候通常是分别给出两套实现方法,或者通过一个独立的开源框架来实现其中每一种处理方案。比较典型的有:实现批处理的开源方案有MapReduce、Spark;实现流处理的开源方案有Storm;Spark的Streaming 其实本质上也是微批处理。
Flink在实现流处理和批处理时,与传统的一些方案完全不同,它从另一个视角看待流处理和批处理,将二者统一起来:Flink是完全支持流处理,也就是说作为流处理看待时输入数据流是 无界的 ;批处理被作为一种特殊的流处理,只是它的输入数据流被定义为 有界的 。
数据框架的演变
- 传统数据基础架构(单体)
- 微服务架构
- 大数据Lambda架构
- 有状态计算架构
Flink的具体优势
- 同时支持高吞吐、低延时、高性能
- 支持事件时间概念
- 支持有状态计算
- 支持高度灵活的窗口操作
- 基于轻量级分布式快照实现容错
- 基于JVM实现独立的内存管理
- Save Points(保存点)
Flink应用场景
- 实时智能推荐
- 复杂事件处理
- 实时欺诈检测
- 实时数仓和ETL
- 流数据分析
- 实时报表分析
Flink实例
1、Flink接入数据,以自定义的log实例去接收数据并导入kafka中。log实例去接收和发送数据,这样可以轻松作做到业务逻辑和应用数据的解耦操作。
实例化:
private static final Logger kafkaEventLogger = LoggerFactory.getLogger("kafka-event");
日志输出:
kafkaEventLogger.info(JSONObject.toJSONString(bsExchangeRecord));
日志配置:
<springProperty scope="context" name="LOGBACK_SERVERS" source="spring.kafka.bootstrap-servers"/>
<!--输出到kafka-->
<logger name="kafka-event" additivity="false">
<appender-ref ref="AsyncKafkaAppender" />
</logger>
<appender name="AsyncKafkaAppender" class="ch.qos.logback.classic.AsyncAppender">
<appender-ref ref="kafkaAppender" />
</appender>
<appender name="kafkaAppender" class="com.github.danielwegener.logback.kafka.KafkaAppender">
<encoder>
<pattern>%message</pattern>
<charset>utf8</charset>
</encoder>
<topic>streamingLog</topic>
<keyingStrategy class="com.github.danielwegener.logback.kafka.keying.NoKeyKeyingStrategy"/>
<deliveryStrategy class="com.github.danielwegener.logback.kafka.delivery.AsynchronousDeliveryStrategy"/>
<producerConfig>bootstrap.servers=${LOGBACK_SERVERS}</producerConfig>
<producerConfig>retries=1</producerConfig>
</appender>
2、Flink项目实例
Flink项目独立部署用于报表所需数据的实时计算处理和入库。
架包依赖:
<!-- Flink dependencies -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.10.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.12</artifactId>
<version>1.10.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-runtime-web_2.12</artifactId>
<version>1.10.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_2.12</artifactId>
<version>1.10.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_2.12</artifactId>
<version>1.10.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.11_2.12</artifactId>
<version>1.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-redis_2.10</artifactId>
<version>1.1.5</version>
</dependency>
项目代码:
主要方法是main方法负责启动和执行。
1、获取执行环境
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(1000, CheckpointingMode.EXACTLY_ONCE);
// 确保检查点之间有进行500 ms的进度
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
// 检查点必须在一分钟内完成,或者被丢弃
env.getCheckpointConfig().setCheckpointTimeout(60000);
// 同一时间只允许进行一个检查点
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
// env.setStateBackend(new FsStateBackend("hdfs://namenode:40010/flink/checkpoints"));
env.setStateBackend(new MemoryStateBackend(MAX_MEM_STATE_SIZE, false));
// 尝试重启的次数和间隔
env.setRestartStrategy(RestartStrategies.fixedDelayRestart( 3,
org.apache.flink.api.common.time.Time.of(10, TimeUnit.SECONDS)
));
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
2、加载和初始化数据集
// 创建 kafka connector source
FlinkKafkaConsumer010<String> consumer010 = new FlinkKafkaConsumer010<>(BaseConfig.getKafkaTopicConfig(), new SimpleStringSchema(), BaseConfig.getKafkaBaseConfig());
// add source
DataStreamSource<String> inputStream = env.addSource(consumer010);
3、对数据集进行转换操作并进行分流处理
// 使用Side Output分流电池维度、电柜维度和用户维度
final OutputTag<BsExchangeRecord> batterySide = new OutputTag<BsExchangeRecord>("batterySide"){};
final OutputTag<BsExchangeRecord> cabinetSide = new OutputTag<BsExchangeRecord>("cabinetSide"){};
final OutputTag<BsExchangeRecord> userSide = new OutputTag<BsExchangeRecord>("userSide"){};
//1、分流
//将从Kafka获取的JSON数据解析成Java Bean
SingleOutputStreamOperator<BsExchangeRecord> sideDataStream = inputStream.filter((FilterFunction<String>) value -> StringUtils.isNotBlank(value)&& JSONUtil.isJson(value))
.map((MapFunction<String, BsExchangeRecord>) value -> JSONObject.parseObject(value, BsExchangeRecord.class)).process(new ProcessFunction<BsExchangeRecord, BsExchangeRecord>() {
@Override
public void processElement(BsExchangeRecord bsExchangeRecord, Context ctx, Collector<BsExchangeRecord> out) throws Exception {
String batteryId = bsExchangeRecord.getBatteryId();
String cabinetId = bsExchangeRecord.getCabinetId();
String userId = bsExchangeRecord.getUserId();
if(batteryId!=null&&!"".equals(batteryId)){
ctx.output(batterySide,bsExchangeRecord);
}
if(cabinetId!=null&&!"".equals(cabinetId)){
ctx.output(cabinetSide,bsExchangeRecord);
}
if(userId!=null&&!"".equals(userId)){
ctx.output(userSide,bsExchangeRecord);
}else{
out.collect(bsExchangeRecord);
}
}
});
DataStream<BsExchangeRecord> dataStreamBattery = sideDataStream.getSideOutput(batterySide);
DataStream<BsExchangeRecord> dataStreamCabinet = sideDataStream.getSideOutput(cabinetSide);
DataStream<BsExchangeRecord> dataStreamUser = sideDataStream.getSideOutput(userSide);
4、对分流数据进行窗口统计、计算处理和结果集入库
//通过非空判断
DataStream<BsBatteryStatusInfo> processDataStream = dataStreamBattery.keyBy(data -> data.getBatteryId())
// 构造TimeWindow 滚动窗口
.timeWindow(Time.seconds(windowLengthSeconds))
// 窗口函数: 用ProcessWindowFunction计算这段时间内每块电池被使用的次数
.process(new BatteryWindowFunction());
processDataStream.addSink(new MongoDBBatterySink());
processDataStream.print();
DataStream<BsCabinetStatusInfo> processDataStreamCabinet = dataStreamCabinet.keyBy(data -> data.getCabinetId())
// 构造TimeWindow 滚动窗口
.timeWindow(Time.seconds(windowLengthSeconds))
// 窗口函数: 用ProcessWindowFunction计算这段时间内每块电池被使用的次数
.process(new CabinetWindowFunction());
processDataStreamCabinet.addSink(new MongoDBCabinetSink());
processDataStreamCabinet.print();
DataStream<BsUserStatusInfo> processDataStreamUser = dataStreamUser.keyBy(data -> data.getUserId())
// 构造TimeWindow 滚动窗口
.timeWindow(Time.seconds(windowLengthSeconds))
// 窗口函数: 用ProcessWindowFunction计算这段时间内每块电池被使用的次数
.process(new UserWindowFunction());
processDataStreamUser.addSink(new MongoDBUserSink());
processDataStreamUser.print();
5、对操作数据进行统计计算窗口时间内换电数量并更新存储的状态数据,以统计值和状态值累加作为下游实时输出值(电池为例)
package com.energy.model.process;
import cn.hutool.core.util.NumberUtil;
import com.energy.model.BsBatteryStatusInfo;
import com.energy.model.BsExchangeRecord;
import com.energy.model.process.valuestate.TimeType;
import com.energy.model.process.valuestate.ValueStateUtils;
import lombok.extern.slf4j.Slf4j;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.table.shaded.org.joda.time.DateTime;
import org.apache.flink.table.shaded.org.joda.time.DateTimeZone;
import org.apache.flink.util.Collector;
/**
* @description:
* @author: Mr.H
* @create: 2021/1/19
**/
@Slf4j
public class BatteryWindowFunction extends ProcessWindowFunction<BsExchangeRecord, BsBatteryStatusInfo, String, TimeWindow> {
// 定义状态,保存上一次的温度值,定时器时间戳
private ValueState<String> exchangeDayState; //换电日标识
private ValueState<String> exchangeMonthState; //换电月份标识
private ValueState<Long> excBatteryDayNumState; //换电日换电次数累计
private ValueState<Long> excBatteryMonthNumState; //换电月换电次数累计
@Override
public void process(String key, Context context, Iterable<BsExchangeRecord> elements, Collector<BsBatteryStatusInfo> out) throws Exception {
BsBatteryStatusInfo bsBatteryStatusInfo = new BsBatteryStatusInfo();
try{
String exchangeDay = exchangeDayState.value();
String exchangeMonth = exchangeMonthState.value();
Long exchangeDayNum = excBatteryDayNumState.value();
Long exchangeMonthNum = excBatteryMonthNumState.value();
String excDay = ""; //昨日换电日期
String excMonth = ""; //昨日换电月份
long hourNum = 0; //小时内换电次数
long dayNum = 0; //当日换电次数
long monthNum = 0; //当月换电次数
long averageExcNum = 0; //日平均换电次数
for (BsExchangeRecord element : elements) {
if(hourNum==0){
excDay = element.getExchangeDate();
excMonth = element.getExchangeMonth();
}
hourNum += 1;
}
//更新和计算换电次数
if(excMonth!=null && !excMonth.equals(exchangeMonth)){
exchangeMonthState.update(excMonth);
excBatteryMonthNumState.update(Long.valueOf(hourNum));
}else{
monthNum = exchangeMonthNum+hourNum;
excBatteryMonthNumState.update(Long.valueOf(monthNum));
if(excDay!=null && !excDay.equals(exchangeDay)){
exchangeDayState.update(excDay);
excBatteryDayNumState.update(Long.valueOf(hourNum));
}else{
dayNum = exchangeDayNum+hourNum;
excBatteryDayNumState.update(Long.valueOf(dayNum));
}
}
long intervalDay = Integer.valueOf(excDay.substring(4) );
averageExcNum = Long.valueOf(NumberUtil.roundStr(Double.valueOf(monthNum/intervalDay),0));
bsBatteryStatusInfo.setBatteryId(key);
bsBatteryStatusInfo.setBatteryDayExchangeNum(dayNum);
bsBatteryStatusInfo.setBatteryMonthExchangeNum(monthNum);
bsBatteryStatusInfo.setBatteryDayAverageExcNum(averageExcNum);
bsBatteryStatusInfo.setExchangeDate(excDay);
bsBatteryStatusInfo.setExchangeMonth(excMonth);
String windowStart=new DateTime(context.window().getStart(), DateTimeZone.forID("+08:00")).toString("yyyy-MM-dd HH:mm:ss");
String windowEnd=new DateTime(context.window().getEnd(), DateTimeZone.forID("+08:00")).toString("yyyy-MM-dd HH:mm:ss");
log.info("Key: "+key+" 窗口开始时间: "+windowStart+" 窗口结束时间: "+windowEnd+" 电池的使用次数: "+dayNum);
}catch (Exception e){
log.error("数据处理异常:{},{}",e.getMessage() , e.getStackTrace() );
}
out.collect(bsBatteryStatusInfo);
}
@Override
public void open(Configuration parameters) throws Exception {
long timeDay = 1;
long timeMonth = 31;
//创建对象获取指标
ValueStateDescriptor dayState = new ValueStateUtils().initValueState("exchangeDayState", String.class,timeDay, TimeType.DAY);
ValueStateDescriptor monthState = new ValueStateUtils().initValueState("exchangeMonthState", String.class,timeMonth, TimeType.DAY);
ValueStateDescriptor dayNumState = new ValueStateUtils().initValueState("excBatteryDayNumState", Long.class,timeDay, TimeType.DAY);
ValueStateDescriptor monthNumState = new ValueStateUtils().initValueState("excBatteryMonthNumState", Long.class,timeMonth, TimeType.DAY);
exchangeDayState = getRuntimeContext().getState(dayState);
exchangeMonthState = getRuntimeContext().getState(monthState);
excBatteryDayNumState = getRuntimeContext().getState(dayNumState);
excBatteryMonthNumState = getRuntimeContext().getState(monthNumState);
}
@Override
public void close() throws Exception {
}
}
6、对结果集进行入库操作
package com.energy.datahandler.sink;
import com.energy.framework.config.BaseConfig;
import com.energy.framework.data.MongoProperties;
import com.energy.model.BsBatteryStatusInfo;
import com.energy.utils.MogoConnectUtil;
import com.mongodb.MongoClient;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import com.mongodb.client.model.Filters;
import com.mongodb.client.model.UpdateOptions;
import com.mongodb.client.result.UpdateResult;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import org.apache.flink.streaming.api.functions.sink.SinkFunction;
import org.bson.Document;
import org.bson.conversions.Bson;
/**
* @description:
* @author: Mr.H
* @create: 2021/1/18
**/
public class MongoDBBatterySink extends RichSinkFunction<BsBatteryStatusInfo> {
MongoClient mongoClient = null;
private MongoProperties mongoProperties;
@Override
public void invoke(BsBatteryStatusInfo value , SinkFunction.Context context) {
try {
if (mongoClient == null) {
mongoClient = MogoConnectUtil.initConnect();
System.out.println("执行mogodbclient初始化");
}
String mongoDataBase = mongoProperties.getMongoDataBase();
MongoDatabase db = mongoClient.getDatabase(mongoDataBase);
MongoCollection collection = db.getCollection("batteryStatusInfo");
//options.arrayFilters() 更新MongoDB中的嵌套子文档
//电池维度换电记录统计
Bson filter = Filters.eq("batteryId", value.getBatteryId());
Bson update = new Document("$set",
new Document()
.append("exchangeDate", value.getExchangeDate())
.append("exchangeMonth", value.getExchangeMonth())
.append("batteryDayExchangeNum", value.getBatteryDayExchangeNum())
.append("batteryMonthExchangeNum", value.getBatteryMonthExchangeNum())
.append("batteryDayAverageExcNum", value.getBatteryDayAverageExcNum()));
UpdateResult updateResult = collection.updateOne(filter, update, new UpdateOptions().upsert(true));
} catch (Exception e) {
e.printStackTrace();
}
}
@Override
public void open(Configuration parms) throws Exception {
super.open(parms);
}
@Override
public void close() throws Exception {
super.close();
if (mongoClient != null) {
mongoClient.close();
}
}
}