Flink DataStream iterate算子的简单使用
由于DataStream程序可能永远不会完成,因此没有最大迭代次数。相反你需要指定流的哪个部分反馈到迭代,哪个部分使用split转换或转发到下游filter。
下面看示例:
public class IterateOperator {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment sEnv = StreamExecutionEnvironment.getExecutionEnvironment();
sEnv.setParallelism(1);
Properties p = new Properties();
p.setProperty("bootstrap.servers", "localhost:9092");
DataStreamSource<String> source = sEnv.addSource(new FlinkKafkaConsumer010<String>("people", new SimpleStringSchema(), p));
IterativeStream<People> iterate = source.map(new MapFunction<String, People>() {
@Override
public People map(String value) throws Exception {
return new Gson().fromJson(value, People.class);
}
}).iterate();
SingleOutputStreamOperator<People> feedback = iterate.filter(new FilterFunction<People>() {
@Override
public boolean filter(People value) throws Exception {
return "caocao".equals(value.name());
}
});
// 如果有符合feedback过滤条件的数据,比如:name为caocao的,会持续不断的循环输出
feedback.print("feedback:");
iterate.closeWith(feedback);
SingleOutputStreamOperator<People> result = iterate.filter(new FilterFunction<People>() {
@Override
public boolean filter(People value) throws Exception {
return !"caocao".equals(value.name());
}
});
result.print("result:");
// split
SplitStream<People> split = iterate.split(new OutputSelector<People>() {
@Override
public Iterable<String> select(People value) {
ArrayList<String> list = new ArrayList<>();
if ("male".equals(value.sex())) {
list.add("male");
} else {
list.add("female");
}
return list;
}
});
DataStream<People> male = split.select("male");
male.print("male:");
iterate.closeWith(male);
sEnv.execute("IterateOperator");
}
}
在kafka product端,输入{“name”:“caocao”,“age”:18,“sex”:“male”},会发现"feedback:"的流和"male:"在持续不断的迭代输出。