transformation算子基本原理一

最新推荐文章于 2024-05-06 17:22:12 发布

VIP文章 huyang0101

最新推荐文章于 2024-05-06 17:22:12 发布

阅读量308

点赞数

分类专栏： spark算子基本原理文章标签： spark 大数据

本文链接：https://blog.csdn.net/huyang0101/article/details/121917978

版权

前言

本文从源码的角度介绍了mappartitions,、mappartitionswithindex、map、flatmap、fliter等五个算子的基本原理。要理解这五个算子，必须得先理解MapPartitionsRDD。之前写过一篇文章：MapPartitionsRDD基本原理，在此不做赘述。
接下来看下上述提到的五个算子的源码

源码

mappartitions

  def mapPartitions[U: ClassTag](
      f: Iterator[T] => Iterator[U],
      preservesPartitioning: Boolean = false): RDD[U] = withScope {
   
    // 对入参f进行校验，如是否可序列化等，返回一个包装后功能完全和f相同的cleandF
    val cleanedF = sc.clean(f)
    new MapPartitionsRDD(
      this,
      (_: TaskContext, _: Int, iter: Iterator[T]) => cleanedF(iter),

最低0.47元/天解锁文章

优惠劵

huyang0101

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
transformation算子基本原理一

文章目录前言一、mapvalues二、flatmapvalues1.引入库2.读入数据三、mappartitions四、mappartitionswithindex五、map六、flatmap七、filter总结前言本文主要从源码分析mapvalues,、flatmapvalues、mappartitions,、mappartitionswithindex、map、flatmap、fliter等七个算子的基本原理一、mapvalues示例：pandas 是基于NumPy 的一种工具，该工具是为了
复制链接

扫一扫