Transformation和Action常用算子

最新推荐文章于 2024-04-10 22:24:15 发布

VIP文章小财迷，嘻嘻

最新推荐文章于 2024-04-10 22:24:15 发布

阅读量658

点赞数 1

分类专栏： spark 文章标签： spark

本文链接：https://blog.csdn.net/weixin_48185778/article/details/109535649

版权

RDD操作分为lazy和non-lazy两种

Transformation(lazy)：也称转换操作、转换算子

Action(non-lazy)：立即执行，也称动作操作、动作算子

1、Transformation

对于转换操作，RDD的所有转换都不会直接计算结果。

仅记录作用于RDD上的操作
当遇到动作算子(Action)时才会进行真正计算

1.1 map

map(func) ：对原RDD中每个元素运用func函数，并生成新的RDD

map算子输入分区与输出分区一一对应。

val rdd1 = sc.makeRDD(1 to 9,2)
rdd1.map(_*2).collect.foreach(println)
//输出 2 4 6 8 10 12 14 16 18

1.2 filter

filter(filter)：对原RDD中每个元素使用func函数进行过滤，并生成新的RDD

val rdd1 = sc.makeRDD(1 to 9,2)
rdd1.filter(_>5).collect.foreach(println)
//输出 6 7 8 9

1.3 mapValues

原RDD中的Key保持不变，与新的Value一起组成新的RDD中的元素，仅适用于PairRDD。

val rdd1=sc.parallelize(List("dog","tiger","lion","cat","panther","eagle"))
val rdd4 = rdd3.map(x=>(x.length,x))
rdd4.mapValues(x=>"_"+x+"_").collect.foreach(println)
输出：(3,_dog_)
(5,_tiger_)
(4,_lion_)
(3,_cat_)
(7,_panther_)
(5,_eagle_)

1.4 distinct

distinct([numTasks]))：去重

val conf = new SparkConf

最低0.47元/天解锁文章

小财迷，嘻嘻

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Transformation和Action常用算子

文章目录1、Transformation1.1 map1.2 filter1.3 mapValues1.4 distinct1.5 reduceByKey1.6 groupByKey1.7 sortByKey1.8 union1.9 join2、Action2.1 count2.2 collect2.3 take2.4 First2.5 reduce2.6 foreach2.7 lookup2.8 max2.9 min2.10 saveAsTextFileRDD操作分为lazy和non-lazy两种Tr
复制链接

扫一扫