8.intersection操作
对包含数字1,1,2的RDD与包含数字2,2,3的RDD进行交集运算。
scala> val rddData1 = sc.parallelize(Array(1,1,2))
rddData1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[5] at parallelize at <console>:24
scala> val rddData2 = sc.parallelize(Array(2,2,3))
rddData2: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[6] at parallelize at <console>:24
scala> val rddData3 = rddData1.intersection(rddData2)
rddData3: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[12] at intersection at <console>:28
scala> rddData3.collect
res2: Array[Int] = Array(2)
说明:
intersection操作与union操作相反,它会对两个RDD中的元素去重。