Flink CEP简单示例----对用户访问页面顺序检测
CEP是flink早期推出的一个的库,是基于一些规则模型,检测异常行为。比如检测反爬虫,或检测优惠活动褥羊毛行为等。
下面简单介绍一下项目背景,使用CEP做模式检测。
需求:
因公司针对会员,发放优惠券活动,因防止羊毛党褥羊毛,通过Flink CEP进行异常检测。检测规则,如果同一个设备号在5分钟内顺序访问login页面–>my页面–>ling quan页面超过5次,那么该数据print至窗口。
这里使用python脚本来模拟用户的行为日志。
1、Flink CEP检测代码:
public class CEPDemo {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment sEnv = StreamExecutionEnvironment.getExecutionEnvironment();
sEnv.setParallelism(1);
Properties p = new Properties();
p.setProperty("bootstrap.servers", "localhost:9092");
p.setProperty("group.id", "test");
DataStreamSource<String> ds = sEnv.addSource(new FlinkKafkaConsumer09<String>("cep", new SimpleStringSchema(), p));
KeyedStream<Event, String> keyedStream = ds
.map(new MapFunction<String, Event>() {
@Override
public Event map(String value) throws Exception {
return new Gson().fromJson(value, Event.class);
}
})
.assignTimestampsAndWatermarks(new AscendingTimestampExtractor<Event>() {
@Override
public long extractAscendingTimestamp(Event element) {
return element.timestamp;
}
}).keyBy(new KeySelector<Event, String>() {
@Override
public String getKey(Event value) throws Exception {
return value.driverId;
}
});
Pattern<Event, Event> pattern = Pattern.<Event>begin("first")
.where(new SimpleCondition<Event>() {
@Override
public boolean filter(Event value) throws Exception {
return value.event.equals("login");
}
})
.next("second").where(new SimpleCondition<Event>() {
@Override
public boolean filter(Event value) throws Exception {
return value.event.equals("my");
}
})
.followedBy("end").where(new SimpleCondition<Event>() {
@Override
public boolean filter(Event value) throws Exception {
return value.event.equals("ling quan");
}
})
.within(Time.minutes(5)) // 5分钟内
.timesOrMore(5);// 超过5次
PatternStream<Event> patternStream = CEP.pattern(keyedStream, pattern);
patternStream.process(new PatternProcessFunction<Event, String>() {
@Override
public void processMatch(Map<String, List<Event>> match, Context ctx, Collector<String> out) throws Exception {
out.collect(match.toString());
}
}).print();
sEnv.execute("CEP");
}
}
2、模拟kafka的product的python脚本
import random
import time
from kafka import KafkaProducer
if __name__ == '__main__':
driver_ids = ["1001", "1002", "1003", "1004", "1005", "1006", "1007", "1008", "1009"]
events = ["register", "login", "my", "search", "list", "detail", "order", "ling quan"]
p = KafkaProducer(bootstrap_servers="localhost:9092")
while True:
i = random.randint(0, len(driver_ids) - 1)
driverId = driver_ids[i]
index = random.randint(0, len(events) - 1)
event = events[index]
timestamp = int(time.time() * 1000)
v = '{"driverId":"%s","event":"%s","timestamp":%s}' % (driverId, event, timestamp)
print(v)
p.send("cep", bytes(v, encoding="utf-8"))
p.flush()