SparkCore：map与mapPartition、foreach和foreachPartition通过连接数据库对比结果

最新推荐文章于 2022-01-07 18:29:21 发布

11号车厢

最新推荐文章于 2022-01-07 18:29:21 发布

阅读量654

点赞数

分类专栏： Spark2 文章标签： Spark2

本文链接：https://blog.csdn.net/greenplum_xiaofan/article/details/98470719

版权

本文探讨了Spark中的foreach和foreachPartition操作，以及它们与map和mapPartition的相似之处。重点在于效率分析，指出foreach在处理数据时每次输出都要连接一次数据库，可能导致资源浪费，而foreachPartition则更高效，每次处理一个分区的数据，减少数据库连接次数，但可能增加内存压力。建议通过调整分区数来平衡效率和内存使用。

摘要由CSDN通过智能技术生成

源码

  // Actions (launch a job to return a value to the user program)
  /**
   * Applies a function f to all elements of this RDD.
   * 作用一个函数到RDD的所有元素。
   */
  def foreach(f: T => Unit): Unit = withScope {
   
    val cleanF = sc.clean(f)
    sc.runJob(this, (iter: Iterator[T]) => iter.foreach(cleanF))
  }

  /**
   * Applies a function f to each partition of this RDD.
   * 作用一个函数到RDD的每个分区。
   */
  def foreachPartition(f: Iterator[T] => Unit): Unit

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

11号车厢

关注关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
SparkCore：map与mapPartition、foreach和foreachPartition通过连接数据库对比结果

源码 // Actions (launch a job to return a value to the user program) /** * Applies a function f to all elements of this RDD. * 作用一个函数到RDD的所有元素。 */ def foreach(f: T => Unit): Unit = wit...
复制链接

扫一扫