1. 说明
countWindows 包括滚动窗口类型和滑动窗口类型。以下通过代码和输出来说明 countWindows()逻辑。数据源代码:
public static class StreamDataSource extends RichParallelSourceFunction<Tuple2<String, String>> {
private volatile boolean running = true;
@Override
public void run(SourceContext<Tuple2<String, String>> ctx) throws InterruptedException {
Tuple2[] elements = new Tuple2[]{
Tuple2.of("a", "1"),
Tuple2.of("a", "2"),
Tuple2.of("a", "3"),
Tuple2.of("a", "4"),
Tuple2.of("a", "5"),
Tuple2.of("a", "6"),
Tuple2.of("b", "7"),
Tuple2.of("b", "8"),
Tuple2.of("b", "9"),
Tuple2.of("b", "0")
};
int count = 0;
while (running && count < elements.length) {
ctx.collect(new Tuple2<>((String) elements[count].f0, (String) elements[count].f1));
count++;
Thread.sleep(1000);
}
}
@Override
public void cancel() {
running = false;
}
}
2. 滚动计数窗口
设置 windowSize=3,即每3个元素进来计算一次,这个好像比较简单,不额外说明了。
2.1 代码
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
public class FlinkCountWindowDemo {
public static void main(String[] args) throws Exception {
final ParameterTool params = ParameterTool.fromArgs(args);
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.getConfig().setGlobalJobParameters(params);
env.setParallelism(1);
final int windowSize = params.getInt("window", 3);
// read source data
DataStreamSource<Tuple2<String, String>> inStream = env.addSource(new StreamDataSource());
// calculate
DataStream<Tuple2<String, String>> outStream = inStream
.keyBy(0)
.countWindow(windowSize)
.reduce(
new ReduceFunction<Tuple2<String, String>>() {
@Override
public Tuple2<String, String> reduce(Tuple2<String, String> value1, Tuple2<String, String> value2) throws Exception {
return Tuple2.of(value1.f0, value1.f1 + "" + value2.f1);
}
}
);
outStream.print();
env.execute("WindowWordCount");
}
}
2.2 输出
(a,123)
(a,456)
(b,789)
2.3 说明
结果显示丢了一条数据Tuple2.of("b", "0")
,因为最后一条数据已经无法被触发计算了。输出符合预期。
3. 滑动计数窗口
盗用Flink 原理与实现:Window 机制中的一张图,假设有一个滑动计数窗口,每2个元素计算一次最近4个元素的总和,那么窗口工作示意图如下所示:
以下代码是字符串相加示例,和这个图的逻辑几乎一致。以下输出正好可以验证这个图。
3.1 代码
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
public class FlinkCountWindowDemo {
public static void main(String[] args) throws Exception {
final ParameterTool params = ParameterTool.fromArgs(args);
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.getConfig().setGlobalJobParameters(params);
env.setParallelism(1);
final int windowSize = params.getInt("window", 3);
final int slideSize = params.getInt("slide", 2);
// read source data
DataStreamSource<Tuple2<String, String>> inStream = env.addSource(new StreamDataSource());
// calculate
DataStream<Tuple2<String, String>> outStream = inStream
.keyBy(0)
.countWindow(windowSize, slideSize)
.reduce(
new ReduceFunction<Tuple2<String, String>>() {
@Override
public Tuple2<String, String> reduce(Tuple2<String, String> value1, Tuple2<String, String> value2) throws Exception {
return Tuple2.of(value1.f0, value1.f1 + "" + value2.f1);
}
}
);
outStream.print();
env.execute("WindowWordCount");
}
}
3.2 输出
(a,12)
(a,234)
(a,456)
(b,78)
(b,890)
3.3 说明
其实就是每进来两个元素,就对最近的三个元素计算一遍,结果符合预期的。