16.ZipWithindex操作
创建由字母A~E组成的RDD,然后将每个元素与其对应的索引值进行合并。
scala> val rddData1 = sc.parallelize(Array("A","B","C","D","E"))
rddData1: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:24
scala> val rddData2 = rddData1.zipWithIndex()
rddData2: org.apache.spark.rdd.RDD[(String, Long)] = ZippedWithIndexRDD[1] at zipWithIndex at <console>:26
scala> rddData2.collect
res0: Array[(String, Long)] = Array((A,0), (B,1), (C,2), (D,3), (E,4))
说明:
ZipWithindex操作将RDD中的元素与该元素在RDD中的索引进行合并。其第1步需要先生成索引号RDD,即“ZipWithindexRDD”;第2步将原始RDD与“ZipWithindexRDD”进行zip操作