Spark 算子-Transformations

17 篇文章 0 订阅
12 篇文章 0 订阅

Spark 算子-Transformations

Spark 常用Transformations算子介绍

操作介绍翻译
map(func)Return a new distributed dataset formed by passing each element of the source through a function func.传入一个函数,作用于RDD每个元素,并返回一个新的RDD
filter(func)Return a new dataset formed by selecting those elements of the source on which func returns true.对RDD中的每个元素进行判断,反会符合条件的元素
flatMap(func)Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).与map类似返回0个或多个元素
mapPartitions(func)Similar to map, but runs separately on each partition (block) of the RDD, so func must be of type Iterator< T> => Iterator< U> when running on an RDD of type T.与map类似,但是作用于分区
groupByKey([numPartitions])When called on a dataset of (K, V) pairs, returns a dataset of (K, Iterable< V>) pairs. Note: If you are grouping in order to perform an aggregation (such as a sum or average) over each key, using reduceByKey or aggregateByKey will yield much better performance. Note: By default, the level of parallelism in the output depends on the number of partitions of the parent RDD. You can pass an optional numPartitions argument to set a different number of tasks.根据Key进行分组,返回(Key,Iterable< value>)
reduceByKey(func, [numPartitions])When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function func, which must be of type (V,V) => V. Like in groupByKey, the number of reduce tasks is configurable through an optional second argument.对每个Key对应的values 进行reduce操作
sortByKey([ascending], [numPartitions])When called on a dataset of (K, V) pairs where K implements Ordered, returns a dataset of (K, V) pairs sorted by keys in ascending or descending order, as specified in the boolean ascending argument.对Key进行排序

Demo

/**
  * Spark Transformations demo
  */
object TransformationsApp {
  def main(args: Array[String]): Unit = {
    val sparkConf= new SparkConf().setAppName("TransformationsApp").setMaster("local[2]")
    val sc = new SparkContext(sparkConf)
    //创建数据集
    val data1 = sc.parallelize(Array("a","b","c","d","e"),2)
    //map demo 对每个元素进行操作返回一个tupe
    //val mapData = data1.map((_,1)).foreach(println)
    /*
     (a,1)
     (b,1)
     (c,1)
     (d,1)
     (e,1)
      */
    //filter demo 对元素进行过滤
   // val filterData=data1.filter(x=>(x=="a")).foreach(println)//a

    val data2 = sc.parallelize(Array(Array("a","b","c","d","e"),Array("q","w","r")))
  //  val mapData=data2.map((_,1)).foreach(println(_))
    /*
    ([Ljava.lang.String;@5c4cd0b8,1)
    ([Ljava.lang.String;@b40de16,1)
     */
   // val flatMapData=data2.flatMap(_.map((_,1))).foreach(println(_))
    /*
    (a,1)
    (b,1)
    (c,1)
    (d,1)
    (e,1)
    (q,1)
    (w,1)
    (r,1)
    */
    //对比结果可以看出,flatMap对元素进行压平在进行map操作

//    val mapPartitions=data1.mapPartitions(x=>{
//      x.map((_,1))
//    }).foreach(println(_))
    /*
    (a,1)
    (c,1)
    (b,1)
    (d,1)
    (e,1)
     */
    //结果与map类似,mapPartitions作用于每个分区
    val data3 = sc.parallelize(Array("a","b","c","d","e","a","a","d","d"),1)
    //key聚合返回<Key,Iterable[V]>
   // val groupByKeyData = data3.map((_,1)).groupByKey().foreach(println(_))
    /*
    (e,CompactBuffer(1))
    (d,CompactBuffer(1, 1))
    (a,CompactBuffer(1, 1, 1))
    (b,CompactBuffer(1, 1))
    (c,CompactBuffer(1))
     */
    //key聚合返回reduce结果
   //val reduceByKeyData = data3.map((_,1)).reduceByKey(_+_).foreach(println(_))
   /*
    (e,1)
    (a,3)
    (d,2)
    (c,1)
    (b,2)
    */
    //对key进行排序,注意时反区内排序
    val sortByKeyData = data3.map((_,1)).reduceByKey(_+_).sortByKey().foreach(println(_))
    //sortBy可以指定排序字段,默认时升序
    data3.map((_,1)).reduceByKey(_+_).sortBy(_._2).foreach(println(_))
    sc.stop()
  }
}
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值