RDD常用操作(一)

take(N):随机取RDD中N个元素
scala> s3.take(4)
res4: Array[Int] = Array(1, 2, 3, 4)


takeOrder(N):升序取出RDD中N个元素
scala> s3.takeOrdered(2)
res7: Array[Int] = Array(1, 2)


top(N):降序取出RDD中N个元素
scala> s3.top(4)
res8: Array[Int] = Array(7, 6, 5, 4)


map()
返回:一个新的RDD
scala> val rrr = sc.parallelize(List(1,2,3,3))
scala> val rrr=rdd.map(x => x +1 )
scala> rrr.collect()
        Array[Int] = Array(2, 3, 4, 4)
scala> rrr.foreach(println)//打印,结果似乎是无序的


scala> val rdd2=sc.makeRDD(List(1,2,3,4))
scala> rdd2.map(x=>1 to x).collect
res22: Array[scala.collection.immutable.Range.Inclusive] = Array(Range(1), Range(1, 2), Range(1, 2, 3), Range(1, 2, 3, 4))


flatMap():对RDD中的每一个元素经过指定函数运算产生若干个新元素,所有新元素构成新的RDD
scala> rdd2.flatMap(x=>1 to x).collect
res21: Array[Int] = Array(1, 1, 2, 1, 2, 3, 1, 2, 3, 4)


flatMap()与map()的区别:
scala> t1.collect
res18: Array[String] = Array(hello hadoop world, hello spark hive sql nihao vim, linux nihao spark xiaomi, world xiaomi hive vim test, "this is spark test ", "hello hadoop is could ", "OpenStack is could could ")
//每个逗号分割的是一行


scala> val t2 = t1.map(x=>x.split(" "))
t2: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[17] at map at <console>:25
scala> t2.collect
res19: Array[Array[String]] = Array(Array(hello, hadoop, world), Array(hello, spark, hive, sql, nihao, vim), Array(linux, nihao, spark, xiaomi), Array(world, xiaomi, hive, vim, test), Array(this, is, spark, test), Array(hello, hadoop, is, could), Array(OpenStack, is, could, could))
//每一行里又建立了一个Array


scala> val t3=t1.flatMap(x=>x.split(" "))
t3: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[18] at flatMap at <console>:25
scala> t3.collect
res20: Array[String] = Array(hello, hadoop, world, hello, spark, hive, sql, nihao, vim, linux, nihao, spark, xiaomi, world, xiaomi, hive, vim, test, this, is, spark, test, hello, hadoop, is, could, OpenStack, is, could, could)
//flatMap做的是扁平化


filter()
对每个元素进行筛选,返回符合条件的元素组成的一个新RDD
scala> val rdd = sc.parallelize(List(1,2,3,3))
scala> rdd.filter(x => x != 1).collect()
     Array[Int] = Array(2, 3, 3)
countApprox(超时时间) :统计近似元素个数


sortBy()排序
scala> val d1=sc.makeRDD(List(3,1,4,2,6,9))
scala> d1.sortBy(x=>x) //默认升序
scala> res30.collect
res31: Array[Int] = Array(1, 2, 3, 4, 6, 9)
scala> d1.sortBy(x=>x,false)
scala> res32.collect
res33: Array[Int] = Array(9, 6, 4, 3, 2, 1)


scala> val l1=sc.makeRDD(List(("cat",21),("dog",2),("pig",3)))
scala> l1.sortBy(x=>x._2).collect
res69: Array[(String, Int)] = Array((dog,2), (pig,3), (cat,21))
scala> l1.sortBy(x=>x._1).collect
res70: Array[(String, Int)] = Array((cat,21), (dog,2), (pig,3)  


countByValue() : 各个元素出现的次数,返回Map类型
scala> val c=sc.makeRDD(List(1,3,1,4,1,3))
scala> c.countByValue

res75: scala.collection.Map[Int,Long] = Map(4 -> 1, 1 -> 3, 3 -> 2)


欢迎阅读者加qq1204738320交流学习


  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值