-
基础转换操作
-
键值转换操作
基础转换操作
-
map[U](f:(T)=>U):RDD[U]
对RDD中的每个元素都应用一个指定的函数,以此产生一个新的RDD
scala> var rdd = sc.textFile("/Users/lyf/Desktop/test/data1.txt")
rdd: org.apache.spark.rdd.RDD[String] = /Users/lyf/Desktop/test/data1.txt MapPartitionsRDD[13] at textFile at <console>:24
scala> rdd.map(line => line.split(" ")).collect
res16: Array[Array[String]] = Array(Array(Hello, World), Array(Hello, Tom), Array(Hello, Jerry))
-
distince():RDD[(T)]
去除RDD中重复的元素,返回所有元素不重复的RDD
scala> var rdd = sc.parallelize(List(1,2,2,3,3,3,4,5))
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[15] at parallelize at <console>:24
scala> rdd.distinct.collect
res18: Array[Int] = Array(4, 1, 5, 2, 3)
-
distince(numPartions: Int):RDD[T]
scala> var rdd = sc.parallelize(List(1,2,2,3,3,3,4,5))
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[15] at parallelize at <console>:24
scala> var rddDistinct = rdd.distinct
rd