【腾讯云Ckafka】 带宽限制 Increase the fetch size on the client (using max.partition.fetch.bytes)

问题描述:structured streaming读腾讯云的ckafka,当消费流量超过消费峰值带宽时,structured streaming任务会挂掉,报错日志如下:

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 126.0 failed 4 times, most recent failure: Lost task 6.3 in stage 126.0 (TID 1354, 10.123.42.47, executor 3): 
org.apache.kafka.common.errors.RecordTooLargeException: There are some messages at [Partition=Offset]: {test_A12-2=149689780}
 whose size is larger than the fetch size 1024000 and hence cannot be ever returned.
 Increase the fetch size on the client (using max.partition.fetch.bytes), 
or decrease the maximum message size the broker will allow (using message.max.bytes).
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1651)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1639)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1638)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1638)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1872)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1821)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1810)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:933)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:933)
at org.apache.spark.sql.execution.streaming.ForeachSink.addBatch(ForeachSink.scala:49)

问题分析:

step1:起初通过异常信息,任务是有大的消息,超出了max.partition.fetch.bytes的大小,网上搜这个异常也全都是这种说法,要调小message.max.bytes或者调大max.partition.fetch.bytes。但是调了相应参数后,依然报错,尤其是一跑批次的任务,每个批次拉数据的时候,瞬时流量特别高,任务肯定挂掉,根本跑不起来。

step2:通过step1行不通后,在腾讯云提工单提该问题,然后腾讯云的人很快就拉了个群处理问题,但是他们起初就是说限流会缩小每次fetch.request.max.bytes的大小,让我们提高带宽解决问题,但是缩小每次fetch.request.max.bytes的大小并不会导致任务失败,后来我看spark和kafak的源代码,实际情况就是consumer clinet在读fetch回来的数据的时候,遇到了EOFException,然后kafka本身的机制是任务读对应buffer遇到EOFException会认为就是数据都读完了,然后后边有判断buffer.limit>0,如果大于0,就认为是遇到了大的消息,抛出RecordTooLargeException(如果也想看看代码,截图见最下方)。最后确定是腾讯云给 消息内容截取了,EOFException的是因为流中没有结束标识字符。

step3:确定是消息内容被截取,kafka client读数据失败,我采取的办法是在spark kafak source实现中加入捕获异常,重试fetch数据。spark的实现就不细说了,我修改的内容如下:

首先要改的文件是org.apache.spark.sql.kafka010.KafkaDataConsumer,然后在项目中创建个org.apache.spark.sql.kafka010的package,然后把KafkaDataConsumer放进来,修改

* get(
* offset: Long,
* untilOffset: Long,
* pollTimeoutMs: Long,
* failOnDataLoss: Boolean):
* ConsumerRecord[Array[Byte], Array[Byte]]函数

 

附件:

step2截图:

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

sunyang098

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值