flink AsyncWaitOperator 采坑 java.io.EOFException: No more bytes left

背景

最近有个项目是会用到flink做实时数据流处理,整个流图中算子较多,其中有一个算子依赖外部存储查询的数据,这里显然用AsyncWaitOperator来异步处理比较合适,写完之后就直接上线了,过了一阵子,因为迭代的需求,需要在AsyncWaitOperator的input对象中新增一个double类型,以为flink是直接兼容的,后面上线从state状态恢复启动失败,看异常日志,抛 java.io.EOFException: No more bytes left. 详细日志如下:

[Full restart]java.lang.Exception: Exception while creating StreamOperatorStateContext.
	at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:199)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:303)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:1170)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:459)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:937)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:692)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.FlinkException: Could not restore operator state backend for AsyncWaitOperator_f9688c8e333fe3850523a1cc41cc2cb6_(378/400) from any of the 1 provided restore options.
	at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
	at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.operatorStateBackend(StreamTaskStateInitializerImpl.java:259)
	at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:147)
	... 6 more
Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed when trying to restore operator state backend
	at org.apache.flink.runtime.state.DefaultOperatorStateBackendBuilder.build(DefaultOperatorStateBackendBuilder.java:88)
	at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createOperatorStateBackend(RocksDBStateBackend.java:631)
	at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$operatorStateBackend$0(StreamTaskStateInitializerImpl.java:250)
	at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
	at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
	... 8 more
Caused by: java.io.EOFException: No more bytes left.
	at org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.require(NoFetchingInput.java:79)
	at com.esotericsoftware.kryo.io.Input.readLong(Input.java:668)
	at com.esotericsoftware.kryo.io.Input.readDouble(Input.java:799)
	at com.esotericsoftware.kryo.serializers.UnsafeCacheFields$UnsafeDoubleField.read(UnsafeCacheFields.java:180)
	at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
	at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
	at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:346)
	at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:227)
	at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:48)
	at org.apache.flink.runtime.state.OperatorStateRestoreOperation.deserializeOperatorStateValues(OperatorStateRestoreOperation.java:191)
	at org.apache.flink.runtime.state.OperatorStateRestoreOperation.restore(OperatorStateRestoreOperation.java:165)
	at org.apache.flink.runtime.state.DefaultOperatorStateBackendBuilder.build(DefaultOperatorStateBackendBuilder.java:85)
	... 12 more

分析

从异常日志中可以得到一些信息:

  1. 报错异常和序列化有关系 org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize , 且是kryo序列化器
  2. 是异步AsyncWaitOperator中报出来的 Could not restore operator state backend for AsyncWaitOperator_f9688c8e333fe3850523a1cc41cc2cb6_(378/400) from any of the 1 provided restore options
  3. 异常关键字 No more bytes left

AsyncWaitOperator中没有显示使用自定义state,但是AsyncWaitOperator在执行快照时,其实会把异步队列中的数据序列化保存到state中,所以这里显然是input对象中新增了一个double类型导致异步数据从state中序列化失败,从而导致从state恢复失败。

项目中input中用到的是一个public类,结构如下:

@Data
@Builder
public class TestAsyncWaitOperatorInput {
    private long a;
    private long b;
    private double c; // 这次新增字段,导致AsyncWaitOperator从state中恢复失败
}

这里AsyncWaitOperator 的 TestAsyncWaitOperatorInput 应该是一个POJO类型才对,并用 PojoSerializer 序列化(使用 Kryo 作为可配置的回退)。抛出的异常这里居然是Kryo,所以看上去像是flink并没有识别出来这个input类型是一个POJO,或者说这里的input并不是一个严格意义上的POJO

这里再看下POJO类的定义:

Rules for POJO types
Flink recognizes a data type as a POJO type (and allows “by-name” field referencing) if the following conditions are fulfilled:

  1. The class is public and standalone (no non-static inner class)

  2. The class has a public no-argument constructor

  3. All non-static, non-transient fields in the class (and all
    superclasses) are either public (and non-final) or have a public
    getter- and a setter- method that follows the Java beans naming
    conventions for getters and setters.

Note that when a user-defined data type can’t be recognized as a POJO type, it must be processed as GenericType and serialized with Kryo.

这里再仔细check下,感觉破案了,仔细看第二条, 再看我们的类注解,这里有个builder注解…
所以没有无参的构造器,破坏了POJO的规范 ,下面贴一下编译后的class文件:

public class TestAsyncWaitOperatorInput {
    private long a;
    private long b;
    private double c;

    TestAsyncWaitOperatorInput(long a, long b, double c) {
        this.a = a;
        this.b = b;
        this.c = c;
    }

    public static TestAsyncWaitOperatorInputBuilder builder() {
        return new TestAsyncWaitOperatorInputBuilder();
    }

    public long getA() {
        return this.a;
    }

    public long getB() {
        return this.b;
    }

    public double getC() {
        return this.c;
    }

    public void setA(long a) {
        this.a = a;
    }

    public void setB(long b) {
        this.b = b;
    }

    public void setC(double c) {
        this.c = c;
    }

    public boolean equals(Object o) {
        if (o == this) {
            return true;
        } else if (!(o instanceof TestAsyncWaitOperatorInput)) {
            return false;
        } else {
            TestAsyncWaitOperatorInput other = (TestAsyncWaitOperatorInput)o;
            if (!other.canEqual(this)) {
                return false;
            } else if (this.getA() != other.getA()) {
                return false;
            } else if (this.getB() != other.getB()) {
                return false;
            } else {
                return Double.compare(this.getC(), other.getC()) == 0;
            }
        }
    }

    protected boolean canEqual(Object other) {
        return other instanceof TestAsyncWaitOperatorInput;
    }

    public int hashCode() {
        int PRIME = true;
        int result = 1;
        long $a = this.getA();
        result = result * 59 + (int)($a >>> 32 ^ $a);
        long $b = this.getB();
        result = result * 59 + (int)($b >>> 32 ^ $b);
        long $c = Double.doubleToLongBits(this.getC());
        result = result * 59 + (int)($c >>> 32 ^ $c);
        return result;
    }

    public String toString() {
        long var10000 = this.getA();
        return "TestAsyncWaitOperatorInput(a=" + var10000 + ", b=" + this.getB() + ", c=" + this.getC() + ")";
    }

    public static class TestAsyncWaitOperatorInputBuilder {
        private long a;
        private long b;
        private double c;

        TestAsyncWaitOperatorInputBuilder() {
        }

        public TestAsyncWaitOperatorInputBuilder a(long a) {
            this.a = a;
            return this;
        }

        public TestAsyncWaitOperatorInputBuilder b(long b) {
            this.b = b;
            return this;
        }

        public TestAsyncWaitOperatorInputBuilder c(double c) {
            this.c = c;
            return this;
        }

        public TestAsyncWaitOperatorInput build() {
            return new TestAsyncWaitOperatorInput(this.a, this.b, this.c);
        }

        public String toString() {
            return "TestAsyncWaitOperatorInput.TestAsyncWaitOperatorInputBuilder(a=" + this.a + ", b=" + this.b + ", c=" + this.c + ")";
        }
    }
}

验证

最终这里去掉@Builder后,重新编译,@Data就能生成无参构造器,符合POJO的规则 ,能够被flink POJO序列化器直接序列化支持扩展,所以这里就是踩了@Builder的坑,这里后续需要谨慎使用@Builder,特别是涉及到序列化的类中

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
2023-07-13 09:15:56,872 WARN org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint [] - Unhandled exception java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_372] at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_372] at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_372] at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_372] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) ~[?:1.8.0_372] at org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253) ~[flink-dist-1.15.3.jar:1.15.3] at org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132) ~[flink-dist-1.15.3.jar:1.15.3] at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350) ~[flink-dist-1.15.3.jar:1.15.3] at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151) [flink-dist-1.15.3.jar:1.15.3] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) [flink-dist-1.15.3.jar:1.15.3] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) [flink-dist-1.15.3.jar:1.15.3] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) [flink-dist-1.15.3.jar:1.15.3] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [flink-dist-1.15.3.jar:1.15.3] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [flink-dist-1.15.3.jar:1.15.3] at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [flink-dist-1.15.3.jar:1.15.3] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_372]
07-14
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值