flink spark 闭包的序列化问题

这篇博客分析了在Flink和Spark中遇到的闭包序列化问题及其引发的java.io.NotSerializableException异常。通过示例代码DemoFlink2和DemoFlink3,解释了静态变量在线程安全和序列化方面的挑战,并提供了现象分析。作者建议在处理这类问题时,考虑使用Flink的State机制以避免潜在问题。
摘要由CSDN通过智能技术生成

一,现象

1,问题:在spark和flink中可能会出现java.io.NotSerializableException 异常
也可能存在静态变量线程安全问题。
2,demo

/**
  * 测试java.io.NotSerializableException
  */
object DemoFlink2 {

  class CountTest(var i:Int)

  def main(args: Array[String]): Unit = {
    val environment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    environment.setParallelism(10)

    val buffer: ListBuffer[String] = ListBuffer.apply()
    for(i <-1 to 100){
      buffer.append(s"msg$i")
    }
    val data: DataStream[String] = environment.fromCollection(buffer)

    val countTest: CountTest = new CountTest(0)

    val data1: DataStream[String] = data.map(new RichMapFunction[String, String] {
      override def map(value: String): String = {
        Thread.sleep(1000)
        val name: Int = getRuntimeContext.getIndexOfThisSubtask
        countTest.i = countTest.i + 1
        val msg = name + ":" + countTest.i
        msg
      }
    })
    data1.print()
    environment.execute()
  }
}

/**
  *测试线程安全问题
  */
object DemoFlink3 {

  var count:Int = 0;

  def main(args: Array[String]): Unit = {
    val environment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    environment.setParallelism(10)

    val buffer: ListBuffer[String] = ListBuffer.apply()
    for(i <-1 to 100){
      buffer.append(s"msg$i")
    }
    val data: DataStream[String] = environment.fromCollection(buffer)


    val data1: DataStream[String] = data.map(new RichMapFunction[String, String] {
      override def map(value: String): String = {
        Thread.sleep(1000)
        val name: Int = getRuntimeContext.getIndexOfThisSubtask
        count = count + 1
        val msg = name + ":" + count
        msg
      }
    })
    data1.print()
    environment.execute()
  }
}

如上两个是基于flink 实现的代码:
DemoFlink2:创建啦一个countTest对象放入map算子中开启十个并行度去执行累加操作。
现象:抛出java.io.NotSerializableException 异常,我们把CountTest前面加上case关键字使他可以序列化运行正常而且每个线程都是从0到10输出数据。
展示啦java.io.NotSerializableException 异常的现象。
DemoFlink3:使用静态变量count 放入map算子中开启十个并行度去执行累加操作。
输出结果只有0到87 中间存在重复和丢失结果情况。
展示啦多个并行度下静态变量的线程安全问题。

二,现象分析

1,我们先看看DemoFlink2编译后生成的字节码

public final class lfdemo.DemoFlink2$ {
  public static final lfdemo.DemoFlink2$ MODULE$;

  public static {};
    Code:
       0: new           #2                  // class lfdemo/DemoFlink2$
       3: invokespecial #12                 // Method "<init>":()V
       6: return

  public void main(java.lang.String[]);
    Code:
       0: getstatic     #19                 // Field org/apache/flink/streaming/api/scala/StreamExecutionEnvironment$.MODULE$:Lorg/apache/flink/streaming/api/scala/StreamExecutionEnvironment$;
       3: invokevirtual #23                 // Method org/apache/flink/streaming/api/scala/StreamExecutionEnvironment$.getExecutionEnvironment:()Lorg/apache/flink/streaming/api/scala/StreamExecutionEnvironment;
       6: astore_2
       7: aload_2
       8: bipush        10
      10: invokevirtual #29                 // Method org/apache/flink/streaming/api/scala/StreamExecutionEnvironment.setParallelism:(I)V
      13: getstatic     #34                 // Field scala/collection/mutable/ListBuffer$.MODULE$:Lscala/collection/mutable/ListBuffer$;
      16: getstatic     #39                 // Field scala/collection/immutable/Nil$.MODULE$:Lscala/collection/immutable/Nil$;
      19: invokevirtual #43                 // Method scala/collection/mutable/ListBuffer$.apply:(Lscala/collection/Seq;)Lscala/collection/GenTraversable;
      22: checkca
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值