scala 自定义排序详解
学大数据,选多易!!!
背景:使用RDD的sortBy或sortByKey方法进行排序,根据需求实现灵活的排序规则。
场景:
有一组数据类型是(姓名,工龄,工资),想按照分数降序,姓名升序进行排序。
方式一:利用元组的排序规则特点
元组排序规则:先比较第一个字段,相等再比较第二个字段,第二个相等再比较第三个字段........
object MySort1 {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setAppName(this.getClass.getCanonicalName).setMaster("local[*]")
val sc = new SparkContext(conf)
//创建原始RDD
val lines: RDD[String] = sc.parallelize(List("laowang,3,15000", "laoli,5,20000", "laoma,1,23000", "laohu,5,15000"))
//整理数据
val disposed: RDD[(String, Int, Double)] = lines.mapPartitions(it => {
it.map(line => {
val fields: Array[String] = line.split(",")
val name: String = fields(0)
val years: Int = fields(1).toInt
val salary: Double = fields(2).toDouble
(name, years, salary)
})
})
//利用元组特点排序
val result: RDD[(String, Int, Double)] = disposed.sortBy(t => (-t._3, t._1))
//输出打印数据
println(result.collect().toBuffer)
sc.stop()
}
}
方式二:创建一个普通class,staff类,实现Ordered的特质或者实现comparable,并实现 serializable,
重写comparaTo方法,或者compara方法
1.将数据放入类中 需要重写toString方法
class Staff(val name:String,val years:Int,val salary:Double) extends Ordered[Staff] with Serializable {
override def compare(that: Staff): Int = {
if (this.salary==that.salary){
this.years-that.years
}else{
java.lang.Double.compare(that.salary,this.salary)
}
}
override def toString = s"Staff($name, $years, $salary)"
}
由于重写了compara方法,自定义排序规则,直接按照本身排序即可。
object MySort2 {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setAppName(this.getClass.getCanonicalName).setMaster("local[*]")
val sc = new SparkContext(conf)
//创建原始RDD
val lines: RDD[String] = sc.parallelize(List("laowang,3,15000", "laoli,5,20000", "laoma,1,23000", "laohu,5,15000"))
//整理数据
val disposed: RDD[Staff] = lines.mapPartitions(it => {
it.map(line => {
val fields: Array[String] = line.split(",")
val name: String = fields(0)
val years: Int = fields(1).toInt
val salary: Double = fields(2).toDouble
new Staff(name, years, salary)
})
})
//排序规则:直接按照实例本身排序即可
val result: RDD[Staff] = disposed.sortBy(x => x)
//输出打印数据
println(result.collect().toBuffer)
sc.stop()
}
}
2.数据不放入类中
不需要将数据封装到实例中,只需要按照实例的排序规则即可。
object MySort3 {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setAppName(this.getClass.getCanonicalName).setMaster("local[*]")
val sc = new SparkContext(conf)
//创建原始RDD
val lines: RDD[String] = sc.parallelize(List("laowang,3,15000", "laoli,5,20000", "laoma,1,23000", "laohu,5,15000"))
//整理数据
val disposed: RDD[(String, Int, Double)] = lines.mapPartitions(it => {
it.map(line => {
val fields: Array[String] = line.split(",")
val name: String = fields(0)
val years: Int = fields(1).toInt
val salary: Double = fields(2).toDouble
(name, years, salary)
})
})
//排序规则:new一个对应排序规则的类传入对应要的比较属性即可实现排序。
val result: RDD[(String, Int, Double)] = disposed.sortBy(x => new Staff(x._2,x._3))
//输出打印数据
println(result.collect().toBuffer)
sc.stop()
}
}
方式三:创建一个case class,staff类,实现ordered的特质或者实现comparable,不需要实现 serializable
重写comparaTo方法,或者compara方法
同方式二基本相同,创建的Staff类不需要手动实现 serializable特质, 在创建对象的时候不需要new 。
方式四:创建一个case class,staff 类,不需要实现comparable接口和Ordered特质
使用隐式转换实现一个排序规则,在调用RDD的sortBy方法前导入隐式转换。
object OrderContext {
//隐式类型
implicit object orderingObjectStaff extends Ordering[Staff3] {
override def compare(x: Staff3, y: Staff3): Int = {
if (x.salary==y.salary){
x.years-y.years
}else{
java.lang.Double.compare(y.salary,x.salary)
}
}
}
// implicit val orderingStaff: Ordering[Staff3] = new Ordering[Staff3] {
// override def compare(x: Staff3, y: Staff3) = {
// if (x.salary==y.salary){
// x.years-y.years
// }else{
// java.lang.Double.compare(y.salary,x.salary)
// }
// }
// }
}
定义的case class Staff3,添加隐式转换,实现自定义排序。
case class Staff3(name:String, years:Int,salary:Double)
object MySort3 {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setAppName(this.getClass.getCanonicalName).setMaster("local[*]")
val sc = new SparkContext(conf)
//创建原始RDD
val lines: RDD[String] = sc.parallelize(List("laowang,3,15000", "laoli,5,20000", "laoma,1,23000", "laohu,5,15000"))
//整理数据
val disposed: RDD[Staff3] = lines.mapPartitions(it => {
it.map(line => {
val fields: Array[String] = line.split(",")
val name: String = fields(0)
val years: Int = fields(1).toInt
val salary: Double = fields(2).toDouble
Staff3(name, years, salary)
})
})
//导入隐式转换排序规则
import OrderContext.orderingObjectStaff
val result: RDD[Staff3] = disposed.sortBy(t =>t)
//输出打印数据
println(result.collect().toBuffer)
sc.stop()
}
}