Spark2.3 RDD之 filter源码解析

最新推荐文章于 2020-11-11 14:06:37 发布

DPnice

最新推荐文章于 2020-11-11 14:06:37 发布

阅读量2k

点赞数

分类专栏： spark 文章标签： spark2.3 scala filter

本文链接：https://blog.csdn.net/DPnice/article/details/80094747

版权

spark 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

spark filter源码：

 /**
   * Return a new RDD containing only the elements that satisfy a predicate.
   */
  def filter(f: T => Boolean): RDD[T] = withScope {
    val cleanF = sc.clean(f)
    new MapPartitionsRDD[T, T](
      this,
      (context, pid, iter) => iter.filter(cleanF),
      preservesPartitioning = true)
  }

context, pid, iter 代表 TaskContext, partition index, iterator

scala filter 源码：

  /** Returns an iterator over all the elements of this iterator that satisfy the predicate `p`.
   *  The order of the elements is preserved.
   *
   *  @param p the predicate used to test values.
   *  @return  an iterator which produces those values of this iterator which satisfy the predicate `p`.
   *  @note    Reuse: $consumesAndProducesIterator
   */
  def filter(p: A => Boolean): Iterator[A] = new AbstractIterator[A] {
    // TODO 2.12 - Make a full-fledged FilterImpl that will reverse sense of p
    private var hd: A = _
    private var hdDefined: Boolean = false

    def hasNext: Boolean = hdDefined || {
      do {
        if (!self.hasNext) return false
        hd = self.next()
      } while (!p(hd))
      hdDefined = true
      true
    }

    def next() = if (hasNext) { hdDefined = false; hd } else empty.next()
  }

标红部分其实就是将满足p函数的元素单独拿出来组成新迭代器（元素的顺序不改变）,不满足的直接抛弃。最后这些迭代器

组成新的RDD。

例子：

object Test extends App {

  val sparkConf = new SparkConf().
    setAppName("Test")
    .setMaster("local[6]")

  val spark = SparkSession
    .builder()
    .config(sparkConf)
    .getOrCreate()

  val value: RDD[Int] = spark.sparkContext.parallelize(List(1, 2, 3, 5, 8, 9), 3)
  println(value.filter(_ != 2).getNumPartitions)

}

分区不回被改变。

DPnice

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
Spark2.3 RDD之 filter源码解析

spark filter源码： /** * Return a new RDD containing only the elements that satisfy a predicate. */ def filter(f: T =&gt; Boolean): RDD[T] = withScope { val cleanF = sc.clean(f) new MapPa...
复制链接

扫一扫

专栏目录