Spark RDD transformation操作
1、创建RDD
val nums =sc.parallelize(List(1,2,3))
nums: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[2] at parallelize at <console>:21
2、将RDD转换为新的RDD 结果为(1,4,9)
val squares = nums.map(x =>x*x)
squares: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[3] at map at <console>:23
3、过滤RDD,生成新的RDD运行结果为4
val even =squares.filter(_%2==0)
even: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[4] at filter at <console>:25
val result =squares.filter(_%2==0).collect
result: Array[Int] = Array(4)
4、 val a = nums.flatMap(x =>1 to x) 运行结果为 //将函数运用到每一个函数,然后扁平化操作
a: Array[Int] = Array(1, 1, 2, 1, 2, 3)
这是怎么来的呢
1=>1
2=>1,2
3=>1,2,3
然后右边相加