java spark-streaming Kryo序列化缓冲区太小报错

博客内容描述了在使用Spark Streaming时遇到的Kryo序列化异常,问题出现在`spark.kryoserializer.buffer.max`参数设置过小,导致序列化过程中缓冲区溢出。解决方法是增大该参数值或者选择其他序列化方式。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

错误信息:
KryoException: Buffer overflow. Available: 0, required:XXX

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/hadoop/yarn/local/filecache/185/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
21/07/07 16:58:15 WARN TaskSetManager: Lost task 88.0 in stage 3.0 (TID 1007, node181, executor 1): org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 134217728. To avoid this, increase spark.kryoserializer.buffer.max value.
	at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:350)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:393)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 134217728
	at com.esotericsoftware.kryo.io.Output.require(Output.java:163)
	at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:246)
	at com.esotericsoftware.kryo.io.Output.write(Output.java:209)
	at org.apache.spark.sql.catalyst.expressions.UnsafeRow.write(UnsafeRow.java:676)
	at com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.write(DefaultSerializers.java:505)
	at com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.write(DefaultSerializers.java:503)
	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:366)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:307)
	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
	at com.twitter.chill.Tuple3Serializer.write(TupleSerializers.scala:50)
	at com.twitter.chill.Tuple3Serializer.write(TupleSerializers.scala:45)
	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:366)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:307)
	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
	at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:347)

问题分析:
spark-streaming提交任务的时候使用kryo序列化参数,程序在进行计算的过程中出来报错序列化缓冲大小的问题,sparkConf代码:

SparkConf conf = new SparkConf()
    .setMaster("local[*]")
    .set("spark.driver.allowMultipleContexts", "true")
    .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    .set("spark.kryoserializer.buffer.max", "5")
    .setAppName("MergeConsumer");
JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(5));

是由于对spark.kryoserializer.buffer.max=5该参数值设置过小导致,由于序列化写数据的时候需要对该参数进行校验,如果要写入的数据大于设置的最大值则会抛出该异常.。

解决方法:

  • 调整序序列化参数的最大值,比如1024m。
  • 不适用序列化方式,在sparkConf中去掉这个参数(去掉序列化可能会有其它错误)。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

冰红茶不会渴

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值