flink watermark不触发window计算,装逼失败的经历

注意问题:
1、设置并行度为1
2、先启动nc,并发送数据,哪怕直发送一条数据(为什么?),然后再idea启动程序
提前准备点数据源:

01,1586489566000
01,1586489567000
01,1586489568000
01,1586489569000
01,1586489570000
01,1586489571000
01,1586489572000
01,1586489573000
01,1586489574000
01,1586489575000
01,1586489576000
01,1586489577000
01,1586489578000
01,1586489579000
01,1586489589000

代码:

import net.minidev.json.JSONUtil;
import org.apache.flink.api.common.eventtime.*;
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.api.common.functions.FoldFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.ConfigConstants;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.WindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;

public class WatermarkDemo {
    public static void main(String[] args) throws Exception {
        Configuration config = new Configuration();
//        config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true);  //这个地方为了在idea中运行能看到webui的修改,然并卵
//        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(config);
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
//        env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
//        env.getConfig().setAutoWatermarkInterval(1000L);
        env.setParallelism(1);
//        DataStreamSource<String> data = env.socketTextStream("hdp-1", 7777);
        DataStreamSource<String> data = env.socketTextStream("hdp-1", 7777);
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
        //只执行一遍
//        System.out.println("----------------------" + data.toString());
        SingleOutputStreamOperator<Tuple2<String, Long>> maped = data.map(new MapFunction<String, Tuple2<String, Long>>() {
            @Override
            public Tuple2<String, Long> map(String value) throws Exception {
                String[] split = value.split(",");
                return new Tuple2<String, Long>(split[0], Long.valueOf(split[1]));
            }
        });
        SingleOutputStreamOperator<Tuple2<String, Long>> watermarks = maped.assignTimestampsAndWatermarks(new WatermarkStrategy<Tuple2<String, Long>>() {
            @Override
            public WatermarkGenerator<Tuple2<String, Long>> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context) {
                return new WatermarkGenerator<Tuple2<String, Long>>() {
                    private long maxTimeStamp = Long.MIN_VALUE;

                    @Override
                    public void onEvent(Tuple2<String, Long> event, long eventTimestamp, WatermarkOutput output) {
                        maxTimeStamp = Math.max(maxTimeStamp, event.f1);
                        System.out.println("maxTimeStamp:" + maxTimeStamp + "...format:" + sdf.format(maxTimeStamp));
                    }

                    @Override
                    public void onPeriodicEmit(WatermarkOutput output) {
//                        System.out.println(".....onPeriodicEmit....");
                        long maxOutOfOrderness = 1000;
                        Watermark watermark = new Watermark(maxTimeStamp - maxOutOfOrderness);
//                        System.out.println("水印时间:"+watermark.getTimestamp()+",eventtime="+eventtime);
                        output.emitWatermark(watermark);
                    }
                };
            }
        }.withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
            @Override
            public long extractTimestamp(Tuple2<String, Long> element, long recordTimestamp) {
                return element.f1;
            }
        }));

        KeyedStream<Tuple2<String, Long>, String> keyed = watermarks.keyBy(value -> value.f0);
//        System.out.println("...keyed:" + keyed);


        WindowedStream<Tuple2<String, Long>, String, TimeWindow> windowed = keyed.window(TumblingEventTimeWindows.of(Time.seconds(5)));
//        WindowedStream<Tuple2<String, Long>, String, TimeWindow> windowed = keyed.timeWindow(Time.seconds(5));

      
        SingleOutputStreamOperator<String> result = windowed.apply(new WindowFunction<Tuple2<String, Long>, String, String, TimeWindow>() {
            @Override
            public void apply(String s, TimeWindow window, Iterable<Tuple2<String, Long>> input, Collector<String> out) throws Exception {

                System.out.println("..." + sdf.format(window.getStart()));
                String key = s;
                Iterator<Tuple2<String, Long>> iterator = input.iterator();
                ArrayList<Long> list = new ArrayList<>();
                while (iterator.hasNext()) {
                    Tuple2<String, Long> next = iterator.next();
                    list.add(next.f1);
                }
                Collections.sort(list);
                String result = "key:" + key + "..." + "list.size:" + list.size() + "...list.first:" + sdf.format(list.get(0)) + "...list.last:" + sdf.format(list.get(list.size() - 1)) + "...window.start:" + sdf.format(window.getStart()) + "..window.end:" + sdf.format(window.getEnd());
                out.collect(result);
            }
        });

        result.print();
        env.execute();
    }

}


注意问题:
1、设置并行度为1
2、先启动nc,并发送数据,哪怕直发送一条数据(为什么?)

2021/1/11
window窗口的触发机制
窗口分配器—TumblingEventTimeWindows.of(Time.seconds(5)) 没5秒钟生成一个滚动窗口

WindowedStream<Tuple2<String, Long>, String, TimeWindow> windowed = keyed.window(TumblingEventTimeWindows.of(Time.seconds(5)));

具体怎么分配的窗口?
A {@link WindowAssigner} that windows elements into windows based on the timestamp of the
elements. Windows cannot overlap.
一个{@link WindowAssigner},根据元素的时间戳将元素添加到windows中。窗户不能重叠。

首先去追踪看下 TumblingEventTimeWindows 的 trigger :

@Override
public Trigger<Object, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
	return EventTimeTrigger.create();
}

可以看到使用的是 EventTimeTrigger,继续追到里面看看触发逻辑:

@Override
public TriggerResult onElement(Object element, long timestamp, TimeWindow window, TriggerContext ctx) throws Exception {
	if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
		// if the watermark is already past the window fire immediately
		return TriggerResult.FIRE;
	} else {
		ctx.registerEventTimeTimer(window.maxTimestamp());
		return TriggerResult.CONTINUE;
	}
}

@Override
public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) {
	return time == window.maxTimestamp() ?
		TriggerResult.FIRE :
		TriggerResult.CONTINUE;
}

从触发器里面知道,只有调用 onElement 和 onEventTime 时才有肯能会触发 FIRE。

onElement
先看 onElement 函数,这个函数是数据流中每来一条消息都会调用的,它的逻辑是:

如果窗口最大时间小于等于当前的水印时间,则触发计算
否则,注册一个定时器

2021.1.13
终于找到原因了-------越界!!!!

private long maxTimeStamp = Long.MIN_VALUE;
在这里把maxTimeStamp初始值设置成了Long类型中的最小值

Watermark watermark = new Watermark(maxTimeStamp - maxOutOfOrderness);
在这里又用这个最小值减去了个1000,结果出的结果是错的,错的很巧,贴图记录一下:
在这里插入图片描述
这个十六进制的值
0x8000000000000000
转成十进制是
-9223372036854775808
正常应该:

Watermark watermark = new Watermark(maxTimeStamp - maxOutOfOrderness);

maxTimeStamp - maxOutOfOrderness 
-9223372036854775808 - 1000 = -9223372036854776808

然而结果却是:

maxTimeStamp - maxOutOfOrderness 
9223372036854775808 - 1000 = 9223372036854774808

所以造成水印:时间为9223372036854774808
这个时间戳是未来不知道多少亿年以后了,所以:
output.emitWatermark(watermark);跟新的新水印时间被flink忽略了,不生效!!
老老实实的把代码改成:

 private long maxTimeStamp = 0L;
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值