【Spark八十九】Spark Streaming处理速度滞后于读取速度测试

最新推荐文章于 2022-12-08 00:42:33 发布

axxbc123

最新推荐文章于 2022-12-08 00:42:33 发布

阅读量1.1k

点赞数 1

分类专栏： Spark 文章标签：大数据 ui java

本文链接：https://blog.csdn.net/axxbc123/article/details/84710168

版权

本文探讨了Spark Streaming在处理速度滞后于数据读取速度时的测试情况。测试显示，每秒创建一个RDD，但处理速度为4秒，导致等待队列不匹配。UI数据显示，Spark Streaming共运行95秒，处理23个batch，每个batch平均耗时4秒。文章引用Tathagata Das的观点解释了等待队列计数问题，并强调了处理时间和调度延迟的重要性。如果处理时间超过批处理间隔，可能需要考虑减少处理时间。

摘要由CSDN通过智能技术生成

1. 测试代码

package spark.examples.streaming

import org.apache.spark.SparkConf
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming._

object NetCatStreamingWordCountDelay {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("NetCatStreamingWordCountDelay")
    conf.setMaster("local[3]")
    //Receive data every second
    val ssc = new StreamingContext(conf, Seconds(1))
    val lines = ssc.socketTextStream("192.168.26.140", 9999)
    //Each processing should take about 4 seconds.
    lines.foreachRDD(rdd => {
      println("This is the output even if rdd is empty")
      Thread.sleep(4 * 1000)
    })
    ssc.start()
    ssc.awaitTermination()
  }
}

上面的测试代码：

1. 时间间隔设置为1秒，也就是说，每隔1秒钟，Spark Streaming将创建一个RDD