3.filter操作
将自然数1~100的RDD中所有的质数分配到新RDD中。
scala> val rddData = sc.parallelize(1 to 100)
rddData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[7] at parallelize at <console>:39
scala> import scala.util.control.Breaks._
import scala.util.control.Breaks._
scala> val rddData2 = rddData.filter(n =>{
| var flag = if (n<2) false else true
| breakable{
| for(x <- 2 until n){
| if(n%x == 0){
| flag = false
| break
| }
| }
| }
| flag
| })
rddData2: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[8] at filter at <console>:44
scala> rddData2.collect
res3: Array[Int] = Array(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97)
说明:
import scala.util.control.Breaks._:使用break需要显示导包
rddData.filter:如果参数为True,则该元素会被添加到新的RDD中。