DStream.foreachRDD的简单理解

最新推荐文章于 2023-05-24 17:29:06 发布

大大大大大大太阳

最新推荐文章于 2023-05-24 17:29:06 发布

阅读量1.1k

点赞数 3

分类专栏： spark 文章标签： spark

本文链接：https://blog.csdn.net/qq_40341628/article/details/88311642

版权

spark 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

foreachRDD(func)的官方解释为

The most generic output operator that applies a function, func, to each RDD generated from the stream. This function should push the data in each RDD to an external system, such as saving the RDD to files, or writing it over the network to a database. Note that the function func is executed in the driver process running the streaming application, and will usually have RDD actions in it that will force the computation of the streaming RDDs.

将函数func应用于从流生成的每个RDD的最通用的输出运算符。此函数应该将每个RDD中的数据推送到外部系统，例如将RDD保存到文件中，或者通过网络将其写入数据库。请注意，函数func是在运行流应用程序的驱动程序进程中执行的，并且通常在函数func中包含RDD操作，这将强制计算流RDD。

对于这个定义会产生一个疑问：在一个batch interval里面会产生几个RDD？
结论：有且只有一个。
那么定义里面所说的“each RDD”应该如何理解呢？
DStream可以理解为是基于时间的，即每个interval产生一个RDD，所以如果以时间为轴，每隔一段时间就会产生一个RDD，那么定义中的“each RDD”应该理解为每个interval的RDD，而不是一个interval中的每个RDD。
从spark的源码分析
DStream中的foreachRDD方法最终会调用如下的代码

private def foreachRDD(
    foreachFunc: (RDD[T], Time) => Unit,
    displayInnerRDDOps: Boolean): Unit = {
  new ForEachDStream(this,
    context.sparkContext.clean(foreachFunc, false), displayInnerRDDOps).register()
}

可以看到这个方法里面并没有任何的Iterator，可以对比一下RDD中的foreachPartition和foreach方法，这两个方法是会遍历RDD，所以才会有Iterator类型的引用

def foreach(f: T => Unit): Unit = withScope {
  val cleanF = sc.clean(f)
  sc.runJob(this, (iter: Iterator[T]) => iter.foreach(cleanF))
}

def foreachPartition(f: Iterator[T] => Unit): Unit = withScope {
  val cleanF = sc.clean(f)
  sc.runJob(this, (iter: Iterator[T]) => cleanF(iter))
}

而如果每个interval中有多个RDD，那么DStream中的foreachRDD也一定会有Iterator类型的引用，但是从上述的代码中并没有。

大大大大大大太阳

关注

3
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
DStream.foreachRDD的简单理解

如何高效的使用ForeachRDDforeachRDD(func)的官方解释为The most generic output operator that applies a function, func, to each RDD generated from the stream. This function should push the data in each RDD to an ex...
复制链接

扫一扫

专栏目录