7.union操作
对封装有数字1~10的RDD和封装有数字1到20的RDD求并集
scala> val rddData1 = sc.parallelize(1 to 10)
rddData1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[2] at parallelize at <console>:24
scala> val rddData2 = sc.parallelize(1 to 20)
rddData2: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[3] at parallelize at <console>:24
scala> val rddData3 = rddData1.union(rddData2)
rddData3: org.apache.spark.rdd.RDD[Int] = UnionRDD[4] at union at <console>:28
scala> rddData3.collect
res1: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
说明:
union操作不会对两个RDD中的元素去重。