scala> val data = sc.parallelize(List((1, "www"), (1, "iteblog"), (1, "com"), (2, "bbs"), (2, "iteblog"), (2, "com"), (3, "good")))
data: org.apache.spark.rdd.RDD[(Int, String)] = ParallelCollectionRDD[1] at parallelize at <console>:24
scala> data.collectAsMap
res2: scala.collection.Map[Int,String] = Map(2 -> com, 1 -> com, 3 -> good)
即:后面出现相同key的时候,新value值将前面的value覆盖