RDD详解:
https://blog.csdn.net/u013850277/article/details/73648742
RDD创建方式一:
Parallelized collections are created by calling SparkContext
’s parallelize
method on an existing collection in your driver program (a Scala Seq
). The elements of the collection are copied to form a distributed dataset that can be operated on in parallel. For example, here is how to create a parallelized collection holding the numbers 1 to 5:
val data = Array(1, 2, 3, 4, 5)
val distData = sc.parallelize(data)
//val distData = sc.parallelize(data,5)
Once created, the distributed dataset (distData
) can be operated on in parallel. For example, we might call distData.reduce((a, b) => a + b)
to add up the elements of the array. We describe opera