scala 自定义排序详解

最新推荐文章于 2022-11-26 22:48:18 发布

知足但小新

最新推荐文章于 2022-11-26 22:48:18 发布

阅读量830

点赞数

分类专栏： spark 文章标签：大数据

本文链接：https://blog.csdn.net/weixin_47688331/article/details/108023772

版权

spark 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

scala 自定义排序详解

学大数据，选多易！！！

背景:使用RDD的sortBy或sortByKey方法进行排序，根据需求实现灵活的排序规则。

场景：

有一组数据类型是（姓名，工龄，工资），想按照分数降序，姓名升序进行排序。

方式一：利用元组的排序规则特点

元组排序规则：先比较第一个字段，相等再比较第二个字段，第二个相等再比较第三个字段........

object MySort1 {
  def main(args: Array[String]): Unit = {
    val conf: SparkConf = new SparkConf().setAppName(this.getClass.getCanonicalName).setMaster("local[*]")
    val sc = new SparkContext(conf)
    //创建原始RDD
    val lines: RDD[String] = sc.parallelize(List("laowang,3,15000", "laoli,5,20000", "laoma,1,23000", "laohu,5,15000"))
    //整理数据
    val disposed: RDD[(String, Int, Double)] = lines.mapPartitions(it => {
      it.map(line => {
        val fields: Array[String] = line.split(",")
        val name: String = fields(0)
        val years: Int = fields(1).toInt
        val salary: Double = fields(2).toDouble
        (name, years, salary)
      })
    })
    //利用元组特点排序
    val result: RDD[(String, Int, Double)] = disposed.sortBy(t => (-t._3, t._1))
    //输出打印数据
    println(result.collect().toBuffer)
    sc.stop()
  }
}

方式二：创建一个普通class，staff类，实现Ordered的特质或者实现comparable,并实现 serializable，

重写comparaTo方法，或者compara方法

1.将数据放入类中需要重写toString方法

class Staff(val name:String,val years:Int,val salary:Double) extends Ordered[Staff] with Serializable {
  override def compare(that: Staff): Int = {
    if (this.salary==that.salary){
      this.years-that.years
    }else{
      java.lang.Double.compare(that.salary,this.salary)
    }
  }
  override def toString = s"Staff($name, $years, $salary)"
}

由于重写了compara方法，自定义排序规则，直接按照本身排序即可。

object MySort2 {
  def main(args: Array[String]): Unit = {
    val conf: SparkConf = new SparkConf().setAppName(this.getClass.getCanonicalName).setMaster("local[*]")
    val sc = new SparkContext(conf)
    //创建原始RDD
    val lines: RDD[String] = sc.parallelize(List("laowang,3,15000", "laoli,5,20000", "laoma,1,23000", "laohu,5,15000"))
    //整理数据
    val disposed: RDD[Staff] = lines.mapPartitions(it => {
      it.map(line => {
        val fields: Array[String] = line.split(",")
        val name: String = fields(0)
        val years: Int = fields(1).toInt
        val salary: Double = fields(2).toDouble
        new Staff(name, years, salary)
      })
    })
    //排序规则：直接按照实例本身排序即可
    val result: RDD[Staff] = disposed.sortBy(x => x)
    //输出打印数据
    println(result.collect().toBuffer)
    sc.stop()
  }
}

2.数据不放入类中

不需要将数据封装到实例中，只需要按照实例的排序规则即可。

object MySort3 {
  def main(args: Array[String]): Unit = {
    val conf: SparkConf = new SparkConf().setAppName(this.getClass.getCanonicalName).setMaster("local[*]")
    val sc = new SparkContext(conf)
    //创建原始RDD
    val lines: RDD[String] = sc.parallelize(List("laowang,3,15000", "laoli,5,20000", "laoma,1,23000", "laohu,5,15000"))
    //整理数据
    val disposed: RDD[(String, Int, Double)] = lines.mapPartitions(it => {
      it.map(line => {
        val fields: Array[String] = line.split(",")
        val name: String = fields(0)
        val years: Int = fields(1).toInt
        val salary: Double = fields(2).toDouble
        (name, years, salary)
      })
    })
    //排序规则：new一个对应排序规则的类传入对应要的比较属性即可实现排序。
    val result: RDD[(String, Int, Double)] = disposed.sortBy(x => new Staff(x._2,x._3))
    //输出打印数据
    println(result.collect().toBuffer)
    sc.stop()
  }
}

方式三：创建一个case class，staff类，实现ordered的特质或者实现comparable,不需要实现 serializable

重写comparaTo方法，或者compara方法

同方式二基本相同，创建的Staff类不需要手动实现 serializable特质，在创建对象的时候不需要new 。

方式四：创建一个case class，staff 类，不需要实现comparable接口和Ordered特质

使用隐式转换实现一个排序规则，在调用RDD的sortBy方法前导入隐式转换。

object OrderContext {
  //隐式类型
  implicit object orderingObjectStaff extends Ordering[Staff3] {
    override def compare(x: Staff3, y: Staff3): Int = {
      if (x.salary==y.salary){
        x.years-y.years
      }else{
        java.lang.Double.compare(y.salary,x.salary)
      }
    }
  }

//  implicit val orderingStaff: Ordering[Staff3] = new Ordering[Staff3] {
//    override def compare(x: Staff3, y: Staff3) = {
//      if (x.salary==y.salary){
//        x.years-y.years
//      }else{
//        java.lang.Double.compare(y.salary,x.salary)
//      }
//    }
//  }

}

定义的case class Staff3，添加隐式转换，实现自定义排序。

case class Staff3(name:String, years:Int,salary:Double)

object MySort3 {
  def main(args: Array[String]): Unit = {
    val conf: SparkConf = new SparkConf().setAppName(this.getClass.getCanonicalName).setMaster("local[*]")
    val sc = new SparkContext(conf)
    //创建原始RDD
    val lines: RDD[String] = sc.parallelize(List("laowang,3,15000", "laoli,5,20000", "laoma,1,23000", "laohu,5,15000"))
    //整理数据
    val disposed: RDD[Staff3] = lines.mapPartitions(it => {
      it.map(line => {
        val fields: Array[String] = line.split(",")
        val name: String = fields(0)
        val years: Int = fields(1).toInt
        val salary: Double = fields(2).toDouble
        Staff3(name, years, salary)
      })
    })
    //导入隐式转换排序规则
    import OrderContext.orderingObjectStaff
    val result: RDD[Staff3] = disposed.sortBy(t =>t)

    //输出打印数据
    println(result.collect().toBuffer)
    sc.stop()
  }
}

知足但小新

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
scala 自定义排序详解

scala 自定义排序详解学大数据，选多易！！！背景:使用RDD的sortBy或sortByKey方法进行排序，根据需求实现灵活的排序规则。场景：有一组数据类型是（姓名，工龄，工资），想按照分数降序，姓名升序进行排序。方式一：利用元组的排序规则特点元组排序规则：先比较第一个字段，相等再比较第二个字段，第二个相等再比较第三个字段........object MySort1 { def main(args: Array[String]): Unit = { val co
复制链接

扫一扫