一,现象
1,问题:在spark和flink中可能会出现java.io.NotSerializableException 异常
也可能存在静态变量线程安全问题。
2,demo
/**
* 测试java.io.NotSerializableException
*/
object DemoFlink2 {
class CountTest(var i:Int)
def main(args: Array[String]): Unit = {
val environment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
environment.setParallelism(10)
val buffer: ListBuffer[String] = ListBuffer.apply()
for(i <-1 to 100){
buffer.append(s"msg$i")
}
val data: DataStream[String] = environment.fromCollection(buffer)
val countTest: CountTest = new CountTest(0)
val data1: DataStream[String] = data.map(new RichMapFunction[String, String] {
override def map(value: String): String = {
Thread.sleep(1000)
val name: Int = getRuntimeContext.getIndexOfThisSubtask
countTest.i = countTest.i + 1
val msg = name + ":" + countTest.i
msg
}
})
data1.print()
environment.execute()
}
}
/**
*测试线程安全问题
*/
object DemoFlink3 {
var count:Int = 0;
def main(args: Array[String]): Unit = {
val environment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
environment.setParallelism(10)
val buffer: ListBuffer[String] = ListBuffer.apply()
for(i <-1 to 100){
buffer.append(s"msg$i")
}
val data: DataStream[String] = environment.fromCollection(buffer)
val data1: DataStream[String] = data.map(new RichMapFunction[String, String] {
override def map(value: String): String = {
Thread.sleep(1000)
val name: Int = getRuntimeContext.getIndexOfThisSubtask
count = count + 1
val msg = name + ":" + count
msg
}
})
data1.print()
environment.execute()
}
}
如上两个是基于flink 实现的代码:
DemoFlink2:创建啦一个countTest对象放入map算子中开启十个并行度去执行累加操作。
现象:抛出java.io.NotSerializableException 异常,我们把CountTest前面加上case关键字使他可以序列化运行正常而且每个线程都是从0到10输出数据。
展示啦java.io.NotSerializableException 异常的现象。
DemoFlink3:使用静态变量count 放入map算子中开启十个并行度去执行累加操作。
输出结果只有0到87 中间存在重复和丢失结果情况。
展示啦多个并行度下静态变量的线程安全问题。
二,现象分析
1,我们先看看DemoFlink2编译后生成的字节码
public final class lfdemo.DemoFlink2$ {
public static final lfdemo.DemoFlink2$ MODULE$;
public static {};
Code:
0: new #2 // class lfdemo/DemoFlink2$
3: invokespecial #12 // Method "<init>":()V
6: return
public void main(java.lang.String[]);
Code:
0: getstatic #19 // Field org/apache/flink/streaming/api/scala/StreamExecutionEnvironment$.MODULE$:Lorg/apache/flink/streaming/api/scala/StreamExecutionEnvironment$;
3: invokevirtual #23 // Method org/apache/flink/streaming/api/scala/StreamExecutionEnvironment$.getExecutionEnvironment:()Lorg/apache/flink/streaming/api/scala/StreamExecutionEnvironment;
6: astore_2
7: aload_2
8: bipush 10
10: invokevirtual #29 // Method org/apache/flink/streaming/api/scala/StreamExecutionEnvironment.setParallelism:(I)V
13: getstatic #34 // Field scala/collection/mutable/ListBuffer$.MODULE$:Lscala/collection/mutable/ListBuffer$;
16: getstatic #39 // Field scala/collection/immutable/Nil$.MODULE$:Lscala/collection/immutable/Nil$;
19: invokevirtual #43 // Method scala/collection/mutable/ListBuffer$.apply:(Lscala/collection/Seq;)Lscala/collection/GenTraversable;
22: checkca