RDD简单练习I(Transformation)

数组转成RDD(并行化scala集合创建RDD)(Transformation转换,延迟加载)

scala> val r1 = sc.parallelize(Array(1,2,3,4,5,6))
r1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[31] at parallelize at <console>:24

查看该rdd分区数量

scala> r1.partitions.length
res26: Int = 1

将不可变List集合转成RDD(Transformation转换,延迟加载)

scala> val r2 = sc.parallelize(List(4,5,6,7,8))
r2: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[32] at parallelize at <console>:24

list转RDD,每个元素*2,排序,true: 升序

scala> val r3 = sc.parallelize(List(1,2,3,4,5,6,10,1)).map(_*2).sortBy(x=>x,true)
r3: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[37] at sortBy at <console>:24
scala> r3.collect
res27: Array[Int] = Array(2, 2, 4, 6, 8, 10, 12, 20)

filter :过滤,留下每个大于5 的元素

scala> val r4 = r2.filter(_ > 5)
r4: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[38] at filter at <console>:26
scala> val r4 = r2.filter(_ > 5).collect
r4: Array[Int] = Array(6, 7, 8)

list转RDD,每个元素*2,字符串排序,true: 升序

scala> val r2 = sc.parallelize(List(1,2,3,4,5,3,7,9)).map(_*2).sortBy(x=>x+"",true)
r2: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[44] at sortBy at <console>:24
scala> val r2 = sc.parallelize(List(1,2,3,4,5,3,7,9)).map(_*2).sortBy(x=>x+"",true).collect
r2: Array[Int] = Array(10, 14, 18, 2, 4, 6, 6, 8)

含义同上

scala> val r2 = sc.parallelize(List(1,2,3,4,5)).map(_*2).sortBy(x=>x.toString,true)
r2: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[54] at sortBy at <console>:24

转换,压平,分割

scala> val r4 = sc.parallelize(Array("1 2 a b","c d e f","g h j"))
r4: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[55] at parallelize at <console>:24
scala> r4.flatMap(_.split(" ")).collect
res28: Array[String] = Array(1, 2, a, b, c, d, e, f, "", g, h, j)

转换,压平,分割,注意: 此处第二个flatMap是调用的集合本身方法,而非RDD

scala> val r5 = sc.parallelize(List(List("a b c","1 2 3"),List("1 2 c","d f g")))
r5: org.apache.spark.rdd.RDD[List[String]] = ParallelCollectionRDD[57] at parallelize at <console>:24
scala> r5.flatMap(_.flatMap(_.split(" "))).collect
res29: Array[String] = Array(a, b, c, 1, 2, 3, 1, 2, c, d, f, g)
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值