spark2.3 RDD之 map 源码解析

最新推荐文章于 2022-11-16 15:10:08 发布

DPnice

最新推荐文章于 2022-11-16 15:10:08 发布

阅读量2.6k

点赞数

分类专栏： spark 文章标签： spark map scala

本文链接：https://blog.csdn.net/DPnice/article/details/80092247

版权

spark 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

spark map源码

/**
   * Return a new RDD by applying a function to all elements of this RDD.
   */
  def map[U: ClassTag](f: T => U): RDD[U] = withScope {
    val cleanF = sc.clean(f)
    new MapPartitionsRDD[U, T](this, (context, pid, iter) => iter.map(cleanF))
  }

scala map 源码

/** Creates a new iterator that maps all produced values of this iterator
   *  to new values using a transformation function.
   *
   *  @param f  the transformation function
   *  @return a new iterator which transforms every value produced by this
   *          iterator by applying the function `f` to it.
   *  @note   Reuse: $consumesAndProducesIterator
   */
  def map[B](f: A => B): Iterator[B] = new AbstractIterator[B] {
    def hasNext = self.hasNext
    def next() = f(self.next())
  }

map将RDD原分区的 iterator 的每一个元素调用传入函数 f ，底层用Scala的map 方法，回调函数map的next，将每一个元素进行计算处理，最后返回一个新的RDD,新的RDD的分区数保持不变。