Spark Streaming 的 UpdateStateByKey操作

最新推荐文章于 2024-05-22 03:21:22 发布

feige1990

最新推荐文章于 2024-05-22 03:21:22 发布

阅读量1.5k

点赞数

分类专栏： Spark

本文链接：https://blog.csdn.net/feige1990/article/details/48634557

版权

Spark 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

updateStateByKey利用给定的函数更新DStream的状态，返回一个新"state"的DStream。操作允许不断用新信息更新它的同时保持任意状态。

你需要通过两步来使用它

定义状态状态可以是任何的数据类型
定义状态更新函数怎样利用更新前的状态和从输入流里面获取的新值更新状态

举个例子说明。若想保持一个文本数据流中每个单词的运行次数，运行次数用一个state表示，它的类型是整数。

object StatefulNetworkWordCount {
  def main(args: Array[String]) {
    if (args.length < 2) {
      System.err.println("Usage: StatefulNetworkWordCount <hostname> <port>")
      System.exit(1)
    }

    StreamingExamples.setStreamingLogLevels()

    val updateFunc = (values: Seq[Int], state: Option[Int]) => {
      val currentCount = values.sum

      val previousCount = state.getOrElse(0)

      Some(currentCount + previousCount)
    }

    val newUpdateFunc = (iterator: Iterator[(String, Seq[Int], Option[Int])]) => {
      iterator.flatMap(t => updateFunc(t._2, t._3).map(s => (t._1, s)))
    }

    val sparkConf = new SparkConf().setAppName("StatefulNetworkWordCount")
    // Create the context with a 1 second batch size
    val ssc = new StreamingContext(sparkConf, Seconds(1))
    ssc.checkpoint(".")

    // Initial RDD input to updateStateByKey
    val initialRDD = ssc.sparkContext.parallelize(List(("hello", 1), ("world", 1)))

    // Create a ReceiverInputDStream on target ip:port and count the
    // words in input stream of \n delimited test (eg. generated by 'nc')
    val lines = ssc.socketTextStream(args(0), args(1).toInt)
    val words = lines.flatMap(_.split(" "))
    val wordDstream = words.map(x => (x, 1))

    // Update the cumulative count using updateStateByKey
    // This will give a Dstream made of state (which is the cumulative count of the words)
    val stateDstream = wordDstream.updateStateByKey[Int](newUpdateFunc,
      new HashPartitioner (ssc.sparkContext.defaultParallelism), true, initialRDD)
    stateDstream.print()
    ssc.start()
    ssc.awaitTermination()
  }
}