scala中groupBy用在partition前面还是后面

最新推荐文章于 2021-10-30 08:56:50 发布

风是外衣衣衣

最新推荐文章于 2021-10-30 08:56:50 发布

阅读量346

点赞数

分类专栏：小知识点

本文链接：https://blog.csdn.net/weixin_41804049/article/details/98872911

版权

小知识点专栏收录该内容

70 篇文章 0 订阅

订阅专栏

要求：根据id分组，并对utc进行排序


    val conf = new SparkConf()
      .setAppName("flow")
      .setMaster("local[*]")
      .registerKryoClasses(Array[Class[_]](A.getClass, Trip.getClass, Line.getClass, Log.getClass, LogMinor.getClass, LogData.getClass, UnConformData.getClass, LineX.getClass, MatchDataMajor.getClass))

    val sparkSession = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate()
    //正确的写法  
    val list: List[A] = List(A(1, 234), A(1, 123), A(1, 345), A(1, 456))
    val data = sparkSession.sparkContext.parallelize(list.groupBy(_.uuid).toList) //在分片前面groupBy
    data.foreachPartition {
      partition =>
        partition.foreach(_._2.sortBy(_.utc).foreach(println))
    }
    /* 结果：
     A(1,123)
      A(1,234)
      A(1,345)
      A(1,456)*/

val conf = new SparkConf()
      .setAppName("flow")
      .setMaster("local[*]")
      .registerKryoClasses(Array[Class[_]](A.getClass, Trip.getClass, Line.getClass, Log.getClass, LogMinor.getClass, LogData.getClass, UnConformData.getClass, LineX.getClass, MatchDataMajor.getClass))

    val sparkSession = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate()
    //错误的写法
    val list: List[A] = List(A(1, 234), A(1, 123), A(1, 345), A(1, 456))
    val data = sparkSession.sparkContext.parallelize(list)

    val unit: Unit = data.foreachPartition {
      var num = 0
      partition =>  //在分片里面进行的分区
        partition.toList.groupBy(_.uuid).map(_._2.sortBy(_.utc)).foreach(println)  
    }
    /**
      * 结果:
      * List(A(1,123))
      * List(A(1,234))
      * List(A(1,345))
      * List(A(1,456))
      */

风是外衣衣衣

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
scala中groupBy用在partition前面还是后面

要求：根据id分组，并对utc进行排序 val conf = new SparkConf() .setAppName("flow") .setMaster("local[*]") .registerKryoClasses(Array[Class[_]](A.getClass, Trip.getClass, Line.getClass, Log.g...
复制链接

扫一扫