1、介绍
FlinkCEP是在Flink之上实现的复杂事件处理(CEP)库。它允许您在无穷无尽的事件流中检测事件模式,使您有机会掌握数据中重要的内容。通常会用来做一些用户操作APP的日志风控策略等多种复杂事件,下面详细以用户连续10s内登陆失败超过3次告警为需求,进行全面讲解。
1.1、整体需求数据详解图
2、官方案例
官方代码案例如下:
DataStream<Event> input = ...
Pattern<Event, ?> pattern = Pattern.<Event>begin("start").where(
new SimpleCondition<Event>() {
@Override
public boolean filter(Event event) {
return event.getId() == 42;
}
}
).next("middle").subtype(SubEvent.class).where(
new SimpleCondition<SubEvent>() {
@Override
public boolean filter(SubEvent subEvent) {
return subEvent.getVolume() >= 10.0;
}
}
).followedBy("end").where(
new SimpleCondition<Event>() {
@Override
public boolean filter(Event event) {
return event.getName().equals("end");
}
}
);
PatternStream<Event> patternStream = CEP.pattern(input, pattern);
DataStream<Alert> result = patternStream.process(
new PatternProcessFunction<Event, Alert>() {
@Override
public void processMatch(
Map<String, List<Event>> pattern,
Context ctx,
Collector<Alert> out) throws Exception {
out.collect(createAlertFrom(pattern));
}
});
2.1、官方案例总结
CEP编程步骤
a)定义模式序列
Pattern.<Class>begin("patternName").API...
基本都是按照如上的套路来新建自定义一个模式规则
后续的可以跟的API可以在官方中查看学习
Event Processing (CEP) | Apache Flink
b)将模式序列作用到流上
CEP.pattern(inputDataStream,pattern)
CEP.pattern()是固定格式写法,
其中第一个参数,表示需要具体作用的流;
第二个参数,表示具体的自定义的模式。
c)提取匹配上的数据和输出
由b)生成的流用process API来进行数据处理输出,继承PatternProcessFunction,重写processMatch(Map<String, List<Event>> pattern,Context ctx,Collector<Alert> out)方法,
第一个参数,表示具体匹配上的数据,其中Map的key就是a)步骤中定义的"patternName"名称,value就是该名称具体对应规则匹配上的数据集;
第二个参数,表示没匹配上的数据侧输出流
第三个参数,表示具体该函数处理完,需要对外输出的内容收集。
3、需求案例详解
下面就以从Socket中模拟读取用户操作日志数据,来进行数据CEP匹配数据输出。
以如下代码把读进来的数据进行数据打平成JavaBean。该章节的讲解以代码段进行,后续章节会把demo代码全部贴出来。
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
/**
* 设置成1,是为了能够触发watermark来计算
*/
env.setParallelism(1);
DataStreamSource<String> socketTextStream = env.socketTextStream("localhost", 8888);
SingleOutputStreamOperator<UserLoginLog> dataStream = socketTextStream.flatMap(new MyFlatMapFunction())
.assignTimestampsAndWatermarks(
WatermarkStrategy.<UserLoginLog>forBoundedOutOfOrderness(Duration.ofSeconds(1))
.withTimestampAssigner((SerializableTimestampAssigner<UserLoginLog>) (element, recordTimestamp) -> element.getLoginTime())
);
3.1、使用begin.where.next.where.next
/**
* 10s钟之内连续3次登陆失败的才输出,强制连续
*/
Pattern<UserLoginLog, UserLoginLog> wherePatternOne = Pattern.<UserLoginLog>begin("start").where(new SimpleCondition<UserLoginLog>() {
@Override
public boolean filter(UserLoginLog value) throws Exception {
return 1 == value.getLoginStatus();
}
}).next("second").where(new IterativeCondition<UserLoginLog>() {
@Override
public boolean filter(UserLoginLog value, Context<UserLoginLog> ctx) throws Exception {
return 1 == value.getLoginStatus();
}
}).next("third").where(new SimpleCondition<UserLoginLog>() {
@Override
public boolean filter(UserLoginLog value) throws Exception {
return 1 == value.getLoginStatus();
}
}).within(Time.seconds(10));
如上根据设置判断登陆状态是否为失败开始计数,连续第二条,第三条如果也同样为失败的话,就会输出
//如下日志数据输入,最终将输出loginId为:11111、11112、11113、11116、11117、11121
{"loginId":11111,"loginTime":1645177352000,"loginStatus":1,"userName":"aaron"}
{"loginId":11112,"loginTime":1645177353000,"loginStatus":1,"userName":"aaron"}
{"loginId":11113,"loginTime":1645177354000,"loginStatus":1,"userName":"aaron"}
{"loginId":11116,"loginTime":1645177355000,"loginStatus":1,"userName":"aaron"}
{"loginId":11117,"loginTime":1645177356000,"loginStatus":1,"userName":"aaron"}
{"loginId":11118,"loginTime":1645177357000,"loginStatus":1,"userName":"aaron"}
{"loginId":11119,"loginTime":1645177358000,"loginStatus":1,"userName":"aaron"}
{"loginId":11120,"loginTime":1645177359000,"loginStatus":0,"userName":"aaron"}
{"loginId":11121,"loginTime":1645177360000,"loginStatus":1,"userName":"aaron"}
{"loginId":11122,"loginTime":1645177361000,"loginStatus":1,"userName":"aaron"}
{"loginId":11123,"loginTime":1645177362000,"loginStatus":1,"userName":"aaron"}
3.1.1需求输出图解
3.2、使用begin.times
/**
* 10s钟之内连续3次登陆失败的才输出,不强制连续
*/
Pattern<UserLoginLog, UserLoginLog> wherePatternTwo = Pattern.<UserLoginLog>begin("start").where(new IterativeCondition<UserLoginLog>() {
@Override
public boolean filter(UserLoginLog value, Context<UserLoginLog> ctx) throws Exception {
return 1 == value.getLoginStatus();
}
}).times(3).within(Time.seconds(10));
如上根据设置判断登陆状态是否为失败开始计数,只要在10秒之内出现第二条,第三条如果也同样为失败的话,就会输出,该本质就是不需要连续出现。
//如下日志数据输入,最终将输出loginId为:11111、11112、11113、11116、11117、11118、11119、11121
{"loginId":11111,"loginTime":1645177352000,"loginStatus":1,"userName":"aaron"}
{"loginId":11112,"loginTime":1645177353000,"loginStatus":1,"userName":"aaron"}
{"loginId":11113,"loginTime":1645177354000,"loginStatus":1,"userName":"aaron"}
{"loginId":11116,"loginTime":1645177355000,"loginStatus":1,"userName":"aaron"}
{"loginId":11117,"loginTime":1645177356000,"loginStatus":1,"userName":"aaron"}
{"loginId":11118,"loginTime":1645177357000,"loginStatus":1,"userName":"aaron"}
{"loginId":11119,"loginTime":1645177358000,"loginStatus":1,"userName":"aaron"}
{"loginId":11120,"loginTime":1645177359000,"loginStatus":0,"userName":"aaron"}
{"loginId":11121,"loginTime":1645177360000,"loginStatus":1,"userName":"aaron"}
{"loginId":11122,"loginTime":1645177361000,"loginStatus":1,"userName":"aaron"}
{"loginId":11123,"loginTime":1645177362000,"loginStatus":1,"userName":"aaron"}
3.2.1、需求图解
3.3、使用begin.times.consecutive
/**
* 10s钟之内连续3次登陆失败的才输出,加上 consecutive 之后 就是 强制连续输出
*/
Pattern<UserLoginLog, UserLoginLog> wherePatternThree = Pattern.<UserLoginLog>begin("start").where(new IterativeCondition<UserLoginLog>() {
@Override
public boolean filter(UserLoginLog value, Context<UserLoginLog> ctx) throws Exception {
return 1 == value.getLoginStatus();
}
}).times(3).consecutive().within(Time.seconds(10));
如上在比3.2的基础上多加了一个consecutive之后,就变成跟3.1一样的效果
//如下日志数据输入,最终将输出loginId为:11111、11112、11113、11116、11117、11121
{"loginId":11111,"loginTime":1645177352000,"loginStatus":1,"userName":"aaron"}
{"loginId":11112,"loginTime":1645177353000,"loginStatus":1,"userName":"aaron"}
{"loginId":11113,"loginTime":1645177354000,"loginStatus":1,"userName":"aaron"}
{"loginId":11116,"loginTime":1645177355000,"loginStatus":1,"userName":"aaron"}
{"loginId":11117,"loginTime":1645177356000,"loginStatus":1,"userName":"aaron"}
{"loginId":11118,"loginTime":1645177357000,"loginStatus":1,"userName":"aaron"}
{"loginId":11119,"loginTime":1645177358000,"loginStatus":1,"userName":"aaron"}
{"loginId":11120,"loginTime":1645177359000,"loginStatus":0,"userName":"aaron"}
{"loginId":11121,"loginTime":1645177360000,"loginStatus":1,"userName":"aaron"}
{"loginId":11122,"loginTime":1645177361000,"loginStatus":1,"userName":"aaron"}
{"loginId":11123,"loginTime":1645177362000,"loginStatus":1,"userName":"aaron"}
4、本Demo所有代码
4.1、pom文件
<properties>
<flink.version>1.14.3</flink.version>
<hadoop.version>2.7.5</hadoop.version>
<scala.binary.version>2.11</scala.binary.version>
<kafka.version>2.4.0</kafka.version>
<redis.version>3.3.0</redis.version>
<lombok.version>1.18.6</lombok.version>
<fastjson.verson>1.2.72</fastjson.verson>
<jdk.version>1.8</jdk.version>
</properties>
<dependencyManagement>
<dependencies>
<!--hadoop 依赖-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<!--flink 依赖-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>com.google.code.findbugs</groupId>
<artifactId>jsr305</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.flink</groupId>
<artifactId>force-shading</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-statebackend-rocksdb_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-runtime-web_2.11</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!--kafka依赖-->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>${kafka.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_2.11</artifactId>
<version>${flink.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<!--redis依赖-->
<dependency>
<groupId>redis.clients</groupId>
<artifactId>jedis</artifactId>
<version>${redis.version}</version>
</dependency>
<!--lombok-->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>${lombok.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>${fastjson.verson}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-cep_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
</dependencies>
</dependencyManagement>
4.2、UserLoginLog类
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
@Data
@AllArgsConstructor
@NoArgsConstructor
class UserLoginLog {
/**
* 登陆id
*/
private int loginId;
/**
* 登陆时间
*/
private long loginTime;
/**
* 登陆状态 1--登陆失败 0--登陆成功
*/
private int loginStatus;
/**
* 登陆用户名
*/
private String userName;
}
4.3、MyFlatMapFunction类
import com.alibaba.fastjson.JSONObject;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang.StringUtils;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.util.Collector;
@Slf4j
public class MyFlatMapFunction implements FlatMapFunction<String, UserLoginLog> {
/**
* The core method of the FlatMapFunction. Takes an element from the input data set and
* transforms it into zero, one, or more elements.
*
* @param value The input value.
* @param out The collector for returning result values.
* @throws Exception This method may throw exceptions. Throwing an exception will cause the
* operation to fail and may trigger recovery.
*/
@Override
public void flatMap(String value, Collector<UserLoginLog> out) throws Exception {
if (StringUtils.isNotBlank(value)) {
UserLoginLog userLoginLog = JSONObject.parseObject(value, UserLoginLog.class);
out.collect(userLoginLog);
}
}
}
4.4、MyPatternProcessFunction类
import lombok.extern.slf4j.Slf4j;
import org.apache.flink.cep.functions.PatternProcessFunction;
import org.apache.flink.util.Collector;
import java.util.List;
import java.util.Map;
@Slf4j
public class MyPatternProcessFunction extends PatternProcessFunction<UserLoginLog, UserLoginLog> {
/**
* Generates resulting elements given a map of detected pattern events. The events are
* identified by their specified names.
*
* <p>{@link Context#timestamp()} in this case returns the time of the
* last element that was assigned to the match, resulting in this partial match being finished.
*
* @param match map containing the found pattern. Events are identified by their names.
* @param ctx enables access to time features and emitting results through side outputs
* @param out Collector used to output the generated elements
* @throws Exception This method may throw exceptions. Throwing an exception will cause the
* operation to fail and may trigger recovery.
*/
@Override
public void processMatch(Map<String, List<UserLoginLog>> match, Context ctx, Collector<UserLoginLog> out) throws Exception {
List<UserLoginLog> start = match.get("start");
out.collect(start.get(0));
}
}
4.4、主类
import lombok.extern.slf4j.Slf4j;
import org.apache.flink.api.common.eventtime.*;
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.cep.pattern.conditions.IterativeCondition;
import org.apache.flink.cep.pattern.conditions.SimpleCondition;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import java.time.Duration;
@Slf4j
public class CepLearning {
public static void main(String[] args) {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
/**
* 设置成1,是为了能够触发watermark来计算
*/
env.setParallelism(1);
DataStreamSource<String> socketTextStream = env.socketTextStream("localhost", 8888);
SingleOutputStreamOperator<UserLoginLog> dataStream = socketTextStream.flatMap(new MyFlatMapFunction())
.assignTimestampsAndWatermarks(
WatermarkStrategy.<UserLoginLog>forBoundedOutOfOrderness(Duration.ofSeconds(1))
.withTimestampAssigner((SerializableTimestampAssigner<UserLoginLog>) (element, recordTimestamp) -> element.getLoginTime())
);
/**
* 10s钟之内连续3次登陆失败的才输出,强制连续
*/
Pattern<UserLoginLog, UserLoginLog> wherePatternOne = Pattern.<UserLoginLog>begin("start").where(new SimpleCondition<UserLoginLog>() {
@Override
public boolean filter(UserLoginLog value) throws Exception {
return 1 == value.getLoginStatus();
}
}).next("second").where(new IterativeCondition<UserLoginLog>() {
@Override
public boolean filter(UserLoginLog value, Context<UserLoginLog> ctx) throws Exception {
return 1 == value.getLoginStatus();
}
}).next("third").where(new SimpleCondition<UserLoginLog>() {
@Override
public boolean filter(UserLoginLog value) throws Exception {
return 1 == value.getLoginStatus();
}
}).within(Time.seconds(10));
/**
* 10s钟之内连续3次登陆失败的才输出,不强制连续
*/
Pattern<UserLoginLog, UserLoginLog> wherePatternTwo = Pattern.<UserLoginLog>begin("start").where(new IterativeCondition<UserLoginLog>() {
@Override
public boolean filter(UserLoginLog value, Context<UserLoginLog> ctx) throws Exception {
return 1 == value.getLoginStatus();
}
}).times(3).within(Time.seconds(10));
/**
* 10s钟之内连续3次登陆失败的才输出,加上 consecutive 之后 就是 强制连续输出
*/
Pattern<UserLoginLog, UserLoginLog> wherePatternThree = Pattern.<UserLoginLog>begin("start").where(new IterativeCondition<UserLoginLog>() {
@Override
public boolean filter(UserLoginLog value, Context<UserLoginLog> ctx) throws Exception {
return 1 == value.getLoginStatus();
}
}).times(3).consecutive().within(Time.seconds(10));
PatternStream<UserLoginLog> patternStream = CEP.pattern(dataStream, wherePatternOne);
PatternStream<UserLoginLog> patternStream1 = CEP.pattern(dataStream, wherePatternTwo);
PatternStream<UserLoginLog> patternStream2 = CEP.pattern(dataStream, wherePatternThree);
SingleOutputStreamOperator<UserLoginLog> process = patternStream.process(new MyPatternProcessFunction());
SingleOutputStreamOperator<UserLoginLog> process1 = patternStream1.process(new MyPatternProcessFunction());
SingleOutputStreamOperator<UserLoginLog> process2 = patternStream2.process(new MyPatternProcessFunction());
process.print("resultOutPut");
process1.print("resultOutPutTwo");
process2.print("resultOutPutThree");
try {
env.execute();
} catch (Exception e) {
e.printStackTrace();
}
}
}