sparkstreaming kafka Failed to get records for after polling for 512

5 篇文章 0 订阅

这个错误上次说的解决方案是设置heartbeat.interval.ms 和 session.timeout.ms这两个参数,但发下效果不理想,错误还是会出现。

从错误日志里翻阅源码,发现了问题所在,报错的代码是:

 at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)

查看CachedKafkaConsumer类的get方法:

/**
   * Get the record for the given offset, waiting up to timeout ms if IO is necessary.
   * Sequential forward access will use buffers, but random access will be horribly inefficient.
   */
  def get(offset: Long, timeout: Long): ConsumerRecord[K, V] = {
    logDebug(s"Get $groupId $topic $partition nextOffset $nextOffset requested $offset")
    if (offset != nextOffset) {
      logInfo(s"Initial fetch for $groupId $topic $partition $offset")
      seek(offset)
      poll(timeout)
    }

    if (!buffer.hasNext()) { poll(timeout) }
    assert(buffer.hasNext(),
      s"Failed to get records for $groupId $topic $partition $offset after polling for $timeout")
    var record = buffer.next()

    if (record.offset != offset) {
      logInfo(s"Buffer miss for $groupId $topic $partition $offset")
      seek(offset)
      poll(timeout)
      assert(buffer.hasNext(),
        s"Failed to get records for $groupId $topic $partition $offset after polling for $timeout")
      record = buffer.next()
      assert(record.offset == offset,
        s"Got wrong record for $groupId $topic $partition even after seeking to offset $offset")
    }

    nextOffset = offset + 1
    record
  }
该函数是取固定offset的数据,若是连续取数,则会用到buffer,效率会很高,类似于批量取数,若随机取数,则效率会很低。

继续往下看 at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)

查看KafkaRDDIterator的next方法,调用了上面的get方法,参数设置为pollTimeout512ms,也就是报错中的512

override def next(): ConsumerRecord[K, V] = {
  assert(hasNext(), "Can't call getNext() once untilOffset has been reached")
  val r = consumer.get(requestOffset, pollTimeout)
  requestOffset += 1
  r
}

pollTimeout = conf.getLong("spark.streaming.kafka.consumer.poll.ms", 512)
在初始sparkconf的时候,将spark.streaming.kafka.consumer.poll.ms设置为10000,就再也不会报该错误了。
除非你的kafka服务器很不稳定,导致poll取数超时。

 

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值