一、Union:输入DataStream,输出DataStream。该算子将两个或者多个DataStream合并,要保证这些DataStream中的元素类型保持一致。继续以之前的flink连接kafka的代码为基础。
input.print();
SingleOutputStreamOperator<Tuple2<String, Integer>> map = input.map(new MapFunction<String, Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> map(String s) throws Exception {
return new Tuple2<>(s, 1);
}
});
SingleOutputStreamOperator<Tuple2<String, Integer>> map1 = input.map(new MapFunction<String, Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> map(String s) throws Exception {
return new Tuple2<>(s, 2);
}
});
map.union(map1).print();
kafka输入:
程序输出:
二、Connect:输入DataStream,输出ConnectedStreams。不同于union,该算子可以将两个元素类型不同的DataStream合并。但是不能直接进行print()等操作,需要转换成元素类型一致的DataStream。
input.print();
SingleOutputStreamOperator<Tuple2<String, Integer>> map = input.map(new MapFunction<String, Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> map(String s) throws Exception {
return new Tuple2<>(s, 1);
}
});
SingleOutputStreamOperator<Tuple2<Integer, String>> map1 = input.map(new MapFunction<String, Tuple2<Integer, String>>() {
@Override
public Tuple2<Integer, String> map(String s) throws Exception {
return new Tuple2<>(2, s);
}
});
SingleOutputStreamOperator<Tuple2<String, Integer>> map2 = map.connect(map1).map(new CoMapFunction<Tuple2<String, Integer>, Tuple2<Integer, String>, Tuple2<String, Integer>>() {
int num = 3;
@Override
public Tuple2<String, Integer> map1(Tuple2<String, Integer> value) throws Exception {
return new Tuple2<>(value.f0, num);
}
@Override
public Tuple2<String, Integer> map2(Tuple2<Integer, String> value) throws Exception {
return new Tuple2<>(value.f1, num);
}
});
map2.print();
kafka输入:
程序输出:
三、Split/Select:Split算子输入DataStream,输出SplitStream。Split算子本质只是对数据进行标记,所以需要Select算子选择具体标记的DataStream.
input.print();
SingleOutputStreamOperator<Tuple2<String, Integer>> map = input.map(new MapFunction<String, Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> map(String s) throws Exception {
return new Tuple2<>(s, (int)(Math.random()*10));
}
});
SplitStream<Tuple2<String, Integer>> split = map.split(new OutputSelector<Tuple2<String, Integer>>() {
@Override
public Iterable<String> select(Tuple2<String, Integer> value) {
if (value.f1 % 2 == 0) {
return new ArrayList<String>() {{
add("even");
add("even2");
}};
} else {
return Arrays.asList("odd");
}
}
});
split.select("even").print("even");
split.select("even2").print("even2");
split.select("odd").print("odd");
split.select("even","odd").print("all");
程序输出:
四、Iterate:该算子适合迭代计算的场景。下面实现一个场景:从kafka接收一个元素,初始化形成一个三元组,三元组的第一个元素是该值,第二个元素是0,用来迭代增加,第三个元素是0,用来计算迭代次数,每次迭代计算,将第二个元素增加0到9的随机数,第三个元素增加1,当第二个元素大于100的时候,程序输出,若小于100,则继续迭代。
input.print();
SingleOutputStreamOperator<Tuple3<String, Integer, Integer>> map = input.map(new MapFunction<String, Tuple3<String, Integer, Integer>>() {
@Override
public Tuple3<String, Integer, Integer> map(String s) throws Exception {
return new Tuple3<>(s, 0, 0);
}
});
IterativeStream<Tuple3<String, Integer, Integer>> iterate = map.iterate();
SingleOutputStreamOperator<Tuple3<String, Integer, Integer>> map1 = iterate.map(new MapFunction<Tuple3<String, Integer, Integer>, Tuple3<String, Integer, Integer>>() {
@Override
public Tuple3<String, Integer, Integer> map(Tuple3<String, Integer, Integer> t) throws Exception {
System.out.println("f1:"+t.f1);
return new Tuple3<>(t.f0, t.f1 + (int) (Math.random() * 10), t.f2 + 1);
}
});
SplitStream<Tuple3<String, Integer, Integer>> split = map1.split(new OutputSelector<Tuple3<String, Integer, Integer>>() {
@Override
public Iterable<String> select(Tuple3<String, Integer, Integer> value) {
int v = value.f1;
if (v > 100) {
return Arrays.asList("out");
} else {
return Arrays.asList("iterate");
}
}
});
iterate.closeWith(split.select("iterate"));
DataStream<Tuple3<String, Integer, Integer>> out = split.select("out");
out.print();
kafka输入:
程序输出: