sparkstreaming kafka Failed to get records for after polling for 512

最新推荐文章于 2024-07-04 10:28:29 发布

lmb633

最新推荐文章于 2024-07-04 10:28:29 发布

阅读量3.7k

点赞数 1

分类专栏： kafka spark 文章标签： kafka sparkstreaming

本文链接：https://blog.csdn.net/lmb09122508/article/details/80522252

版权

spark 同时被 2 个专栏收录

11 篇文章 0 订阅

订阅专栏

kafka

5 篇文章 0 订阅

订阅专栏

这个错误上次说的解决方案是设置heartbeat.interval.ms 和 session.timeout.ms这两个参数，但发下效果不理想，错误还是会出现。

从错误日志里翻阅源码，发现了问题所在，报错的代码是：

at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)

查看CachedKafkaConsumer类的get方法：

/**
   * Get the record for the given offset, waiting up to timeout ms if IO is necessary.
   * Sequential forward access will use buffers, but random access will be horribly inefficient.
   */
  def get(offset: Long, timeout: Long): ConsumerRecord[K, V] = {
    logDebug(s"Get $groupId $topic $partition nextOffset $nextOffset requested $offset")
    if (offset != nextOffset) {
      logInfo(s"Initial fetch for $groupId $topic $partition $offset")
      seek(offset)
      poll(timeout)
    }

    if (!buffer.hasNext()) { poll(timeout) }
    assert(buffer.hasNext(),
      s"Failed to get records for $groupId $topic $partition $offset after polling for $timeout")
    var record = buffer.next()

    if (record.offset != offset) {
      logInfo(s"Buffer miss for $groupId $topic $partition $offset")
      seek(offset)
      poll(timeout)
      assert(buffer.hasNext(),
        s"Failed to get records for $groupId $topic $partition $offset after polling for $timeout")
      record = buffer.next()
      assert(record.offset == offset,
        s"Got wrong record for $groupId $topic $partition even after seeking to offset $offset")
    }

    nextOffset = offset + 1
    record
  }

该函数是取固定offset的数据，若是连续取数，则会用到buffer，效率会很高，类似于批量取数，若随机取数，则效率会很低。

继续往下看 at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)

查看KafkaRDDIterator的next方法，调用了上面的get方法，参数设置为pollTimeout512ms，也就是报错中的512

override def next(): ConsumerRecord[K, V] = {
  assert(hasNext(), "Can't call getNext() once untilOffset has been reached")
  val r = consumer.get(requestOffset, pollTimeout)
  requestOffset += 1
  r
}

pollTimeout = conf.getLong("spark.streaming.kafka.consumer.poll.ms", 512)

在初始sparkconf的时候，将spark.streaming.kafka.consumer.poll.ms设置为10000，就再也不会报该错误了。

除非你的kafka服务器很不稳定，导致poll取数超时。

lmb633

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
sparkstreaming kafka Failed to get records for after polling for 512

这个错误上次说的解决方案是设置heartbeat.interval.ms 和 session.timeout.ms这两个参数，但发下效果不理想，错误还是会出现。从错误日志里翻阅源码，发现了问题所在，报错的代码是： at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)查...
复制链接

扫一扫

专栏目录