9.subtract操作
对封装有数字1~10的RDD和封装有数字1到10的RDD求差集
scala> val rddData1 = sc.parallelize(Array(1,1,2))
rddData1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[5] at parallelize at <console>:24
scala> val rddData2 = sc.parallelize(Array(2,2,3))
rddData2: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[6] at parallelize at <console>:24
scala> val rddData3 = rddData1.subtract(rddData2)
rddData3: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[16] at subtract at <console>:28
scala> rddData3.collect
res3: Array[Int] = Array(1, 1)
说明:
subtract操作是差集运算,但这个过程不会对元素去重。