从0到1的Flink1.12.2练习第一个欺诈实例(带时间范围)

本文链接：https://blog.csdn.net/lihengzkj/article/details/115315772

Flink练习第一个欺诈实例(带时间范围)

背景介绍

上一个实例我们的核心计算逻辑是检测到同一账户上一个消费小于1元，下一个消费大于100元就定性为有欺诈嫌疑，作为初学者的练习已经领略了Flink状态的威力。

本次的实例逻辑将加上时间的监控，我们把欺诈的规则略微做修改。如果同一个账户，在五分钟以内出现两笔交易，一笔小于1元，一笔大于100元，那么我们就人为这个账户有可能存在欺诈的嫌疑，那么就需要生成一条告警信息。

Flink的状态管理

有状态的流处理是FLink官方给出的最优代表性的特征

What is State?
While many operations in a dataflow simply look at one individual event at a time (for example an event parser), some operations remember information across multiple events (for example window operators). These operations are called stateful. URL

这英文是官方给出的关于state的解释，大概意思是在工作流处理中，有一些操作会记录多个事件的信息，e.g. 窗口操作，这些操作就是有状态的。

上面提到了State是Operation记录状态，那这个operation在Flink里面要分为task的的operation和operation本身。也就是task和operation（我们通常说的算子）都是有能力记录状态的, 并且在task可以在运行失败的时候做数据恢复

Flink的state的类型

Keyed State
- Keyed state is maintained in what can be thought of as an embedded key/value store. 就是说Keyed state是一个嵌入式的key-value形式，并且只能作用在已经分区的流数据上，也就是必须是keyed-stream上。
- 分类：
  - ValueState
  - MapState
  - ListState
  - ReducingState
  - AggregatingState
Operator State
- 通常我们看到都是task级别的state,也就是每个task有一个state的记录。e.g. kafka的consumer,一般来说一个task对应一个kafka partition的消费，那么需要记录消费kafka某个partition的offset。

对于state的介绍，官网主要介绍了keyed state，主要也是因为我们平时在处理业务的时候大多数时候都是用的keyed state

带计时器的状态处理

上面已经描述了本次实例的核心逻辑，废话不多说，上代码。

package com.qingshan.practise;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.util.Collector;

import com.qingshan.source.ReadLineSource;


/**
 * 
 * @author qingshanit
 *
 */
public class FraudWithTimerDemo {

	public static void main(String[] args) {
		
		try {
			// 创建Job执行的上下文环境，这里构建的其实就是本地环境
			final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
			// 接入数据源
			final DataStream<String> source = env.addSource(new ReadLine2Source());
			// 数据规范化处理，将data.csv中的每一行转化成ComsumeRecord对象
			final DataStream<ComsumeRecord> consumeRecordsStream = source.map(
				new MapFunction<String, ComsumeRecord>() {

				private static final long serialVersionUID = 1L;

				@Override
				public ComsumeRecord map(String line) throws Exception {
					String [] info = line.split(",");
					return new ComsumeRecord(info[0], info[1],Double.parseDouble(info[2].trim()), Long.parseLong(info[3].trim()));
				}});
			// 把数据流按用户的编号进行分区
			final KeyedStream<ComsumeRecord, String>  keyedDataSteam = consumeRecordsStream.keyBy(new KeySelector<ComsumeRecord, String>(){
				private static final long serialVersionUID = 1L;
				//返回分区的key
				@Override
				public String getKey(ComsumeRecord record) throws Exception {
					return record.getUserid();
				}
			});
			// 通过MyKeyedProcessFunction生成告警
			final DataStream<Alert> alertStream = keyedDataSteam.process(new MyKeyedProcessWithTimerFunction());
			
			// 输出到控制台
			//alertStream.print();
			alertStream.printToErr();
			
			// 启动job,这一步不能忘！
			env.execute("demo");
			
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

}

class MyKeyedProcessWithTimerFunction extends KeyedProcessFunction<String, ComsumeRecord, Alert>{
	
	private transient ValueState<Boolean> isFraudState;
	private transient ValueState<Long> timerState;
	private final static Double LARGE_AMOUNT = 100d;
	private final static Double SAMLL_AMOUNT = 1d;
	
	@Override
	public void open(Configuration parameters) throws Exception {
		//首先需要注册一个ValueState的Descriptor，这个注册的动作只会有一次，所以要在open这个方法里实现
		ValueStateDescriptor<Boolean> isFraudStateDescriptor = new ValueStateDescriptor<Boolean>("isFraudState", Types.BOOLEAN);
		// 从上下文中获取ValueState，类似于初始化属性参数
		isFraudState = this.getRuntimeContext().getState(isFraudStateDescriptor);
		
		//注册一个计时器的状态，用来记录某个key的当前的计时器状态信息
		ValueStateDescriptor<Long> timerStateDescriptor = new ValueStateDescriptor<Long>("timerState", Types.LONG);
		// 从上下文中获取ValueState，类似于初始化属性参数
		timerState = this.getRuntimeContext().getState(timerStateDescriptor);
	}

	private static final long serialVersionUID = 1L;
	/**
	 * 方法processElement是处理每一条记录，而ValueState的作用域是当前的key，也就是我们之前使用keyby分区的账户信息。
	 */
	@Override
	public void processElement(ComsumeRecord record, KeyedProcessFunction<String, ComsumeRecord, Alert>.Context context,
			Collector<Alert> colloetor) throws Exception {
		// 针对当前的key获取对应的状态
		Boolean isLastAmoutIsSamll = isFraudState.value();
		// isLastAmoutIsSamll不为空表示处理的上一条记录是金额小于1元的账单
		if(isLastAmoutIsSamll != null) {
			if(record.getAmount() > LARGE_AMOUNT) {
				Alert alert = new Alert(record.getUserid(),record.getAmount());
				colloetor.collect(alert);
			}
			
			// delete timer
      Long timer = timerState.value();
      context.timerService().deleteProcessingTimeTimer(timer);

      // clean up
      timerState.clear();
      isFraudState.clear();
		}
		// 如果小于LARGE_AMOUNT, 注册一个新的计时器并重新标记为是否欺诈的状态为true，开启下一轮检测
		if(record.getAmount() < SAMLL_AMOUNT) {
			isFraudState.update(true);
			
			//以当前时间，并且制定在
			long timer = context.timerService().currentWatermark() + 1000 * 5;
			context.timerService().registerProcessingTimeTimer(timer);
			timerState.update(timer);
		}
	}
	/**
	 * 当定时器触发的时候，将会调用onTimer方法
	 * 我们在onTimer方法中实现我们清理状态的逻辑
	 * onTimer在这里的作用就是定时清理状态，也就是我们设置的过期时间到了，就会清理状态中记录的信息
	 */
	@Override
	public void onTimer(long timestamp, KeyedProcessFunction<String, ComsumeRecord, Alert>.OnTimerContext ctx,
			Collector<Alert> out) throws Exception {
		// 清理状态信息
		isFraudState.clear();
		timerState.clear();
	}
	
}

class ReadLine2Source implements SourceFunction<String>{
	
	private static final long serialVersionUID = 1L;
	
	private volatile boolean running = true;
	/**
	 * 关闭输出
	 */
	@Override
	public void cancel() {
		running = false;
	}
	/**
	 * 这里的逻辑主要就是每个一秒钟输出文件中的一行
	 */
	@Override
	public void run(SourceContext<String> context) throws Exception {
		BufferedReader br = null;
		String filePath = ReadLineSource.class.getClassLoader().getResource("./data2.csv").getPath();
		try {
			br = new BufferedReader(new FileReader(new File(filePath)));
			String line = null;
			while(running && (line=br.readLine()) != null) {
				//System.err.println("source output:" + line);
				context.collect(line);
				Thread.sleep(1000 * 2);
			}
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		}finally {
			if(br != null) {
				br.close();
			}
		}
	}
}