1. 严格连续
期望所有匹配的事件严格的一个接一个出现,中间没有任何不匹配的事件。举例:获取字母b紧跟着字母b的数据。当且仅当数据为a,b,c,b,b时,对于next模式而言命中的为{b,b}
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternFlatSelectFunction;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.cep.pattern.conditions.SimpleCondition;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
public class SocketStreamingWordCount {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> text = env.readTextFile("E:\\work\\code\\txt\\tmp\\20210909\\test1.txt");
Pattern<String, String> pattern = Pattern.<String>begin("start")
.where(new SimpleCondition<String>() {
@Override
public boolean filter(String value) throws Exception {
return "world".equals(value);
}
})
.followedByAny("end")
.where(new SimpleCondition<String>() {
@Override
public boolean filter(String value) throws Exception {
return "world".equals(value.toString());
}
});
PatternStream<String> patternStream = CEP.pattern(text, pattern);
SingleOutputStreamOperator<String> result = patternStream.flatSelect(new PatternFlatSelectFunction<String, String>() {
@Override
public void flatSelect(Map<String, List<String>> map, Collector<String> collector) throws Exception {
Iterator<String> start = map.get("start").iterator();
while (start.hasNext()){
collector.collect(start.next());
}
}
});
result.print().setParallelism(1);
//注意:因为flink是懒加载的,所以必须调用execute方法,上面的代码才会执行
env.execute("streaming flink cep");
}
}
test1.txt文件内容如下:
hello
world
world
world
hello
world
运行的结果如下:
world
world
2. 松散连续
忽略匹配的事件之间的不匹配的事件。举例:获取字母b下,还有字母b的数据。当且仅当数据为a,b,c,b,b时,对于followedBy模式而言命中的为{b,b},{b,b},一共命中俩次。
代码:
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternFlatSelectFunction;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.cep.pattern.conditions.SimpleCondition;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
public class SocketStreamingWordCount {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> text = env.readTextFile("E:\\work\\code\\txt\\tmp\\20210909\\test1.txt");
Pattern<String, String> pattern = Pattern.<String>begin("start")
.where(new SimpleCondition<String>() {
@Override
public boolean filter(String value) throws Exception {
return "world".equals(value);
}
})
.followedBy("end")
.where(new SimpleCondition<String>() {
@Override
public boolean filter(String value) throws Exception {
return "world".equals(value.toString());
}
});
PatternStream<String> patternStream = CEP.pattern(text, pattern);
SingleOutputStreamOperator<String> result = patternStream.flatSelect(new PatternFlatSelectFunction<String, String>() {
@Override
public void flatSelect(Map<String, List<String>> map, Collector<String> collector) throws Exception {
Iterator<String> start = map.get("start").iterator();
while (start.hasNext()){
collector.collect(start.next());
}
}
});
result.print().setParallelism(1);
//注意:因为flink是懒加载的,所以必须调用execute方法,上面的代码才会执行
env.execute("streaming flink cep");
}
}
test1.txt文件内容如下:
hello
world
world
world
hello
world
运行的结果如下:
world
world
world
非确定的松散连续
更进一步的松散连续,允许忽略掉一些匹配事件的附加匹配。
举例:获取字母b下,还有字母b的数据。当且仅当数据为a,b,c,b,b时,对于followedByAny模式而言命中的为{b,b},{b,b},{b,b}一共命中三次。
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternFlatSelectFunction;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.cep.pattern.conditions.SimpleCondition;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
public class SocketStreamingWordCount {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> text = env.readTextFile("E:\\work\\code\\txt\\tmp\\20210909\\test1.txt");
Pattern<String, String> pattern = Pattern.<String>begin("start")
.where(new SimpleCondition<String>() {
@Override
public boolean filter(String value) throws Exception {
return "world".equals(value);
}
})
.followedByAny("end")
.where(new SimpleCondition<String>() {
@Override
public boolean filter(String value) throws Exception {
return "world".equals(value.toString());
}
});
PatternStream<String> patternStream = CEP.pattern(text, pattern);
SingleOutputStreamOperator<String> result = patternStream.flatSelect(new PatternFlatSelectFunction<String, String>() {
@Override
public void flatSelect(Map<String, List<String>> map, Collector<String> collector) throws Exception {
Iterator<String> start = map.get("start").iterator();
while (start.hasNext()){
collector.collect(start.next());
}
}
});
result.print().setParallelism(1);
//注意:因为flink是懒加载的,所以必须调用execute方法,上面的代码才会执行
env.execute("streaming flink cep");
}
}
test1.txt文件内容如下:
hello
world
world
world
hello
world
运行的结果如下:
world
world
world
world
world
world