排序算子
sortBy等
1.rdd内部转换元祖,
//按照price排序
val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50",
"iphone7 2000 100", "iphone11 5000 50"))
val product = rdd.map(x => {
//按照空格拆分
val split = x.split(" ")
val name = split(0)
val price = split(1).toDouble
val amount = split(2).toInt
(name, price, amount)
}).sortBy(x=>(x._2))//排序规则
//打印数据
product.printInfo()
2.使用自定义类
extends Ordered[ProductInfoV1] with Serializable,实现compare()
import com.bigdata.spark.utils.ImplicitAspect._
import org.apache.spark.{SparkConf, SparkContext}
object SortApp02 {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("my-spark")
val sc = new SparkContext(sparkConf)
val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50", "iphone7 2000 100", "iphone11 5000 50"))
val product = rdd.map(x => {
//按照空格拆分
val split = x.split(" ")
val name = split(0)
val price = split(1).toDouble
val amount = split(2).toInt
new ProductInfoV1(name, price, amount)
}).sortBy(x => x)
//打印信息
product.printInfo()
sc.stop()
}
}
class ProductInfoV1(val name: String, val price: Double, val amount: Int)
extends Ordered[ProductInfoV1] with Serializable {
//重写compare方法
override def compare(that: ProductInfoV1) = {
(this.price - that.price).toInt
}
// 重写toString方法
override def toString: String = {
name + "\t" + price + "\t" + amount
}
}
3: 优化为case class(推荐)
推荐使用case class的原因,主要是因为
1.自动序列化
2.自动重写了toString
3.不需要new
import com.bigdata.spark.utils.ImplicitAspect._
import org.apache.spark.{SparkConf, SparkContext}
object SortApp03 {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("my-spark")
val sc = new SparkContext(sparkConf)
val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50",
"iphone7 2000 100", "iphone11 5000 50"))
val product = rdd.map(x => {
//按照空格拆分
val split = x.split(" ")
val name = split(0)
val price = split(1).toDouble
val amount = split(2).toInt
ProductInfoV2(name, price, amount)
}).sortBy(x => x)
//打印信息
product.printInfo()
sc.stop()
}
}
case class ProductInfoV2(val name: String, val price: Double, val amount: Int)
extends Ordered[ProductInfoV2] {
//重写compare方法
override def compare(that: ProductInfoV2) = {
(this.price - that.price).toInt
}
}
4:优化为case class, implicit(推荐)
定义一个下面这个类,不允许修改此类
case class ProductInfoV2(val name: String, val price: Double, val amount: Int) {
}
然后使用时通过隐式转换对此类进行增强
object SortApp03 {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("my-spark")
val sc = new SparkContext(sparkConf)
val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50",
"iphone7 2000 100", "iphone11 5000 50"))
//隐式转换
implicit def product2Ordered(product: ProductInfoV2) = new Ordered[ProductInfoV2] {
override def compare(that: ProductInfoV2): Int = {
(product.price - that.price).toInt
}
}
val product = rdd.map(x => {
//按照空格拆分
val split = x.split(" ")
val name = split(0)
val price = split(1).toDouble
val amount = split(2).toInt
ProductInfoV2(name, price, amount)
}).sortBy(x => x)
//打印信息
product.printInfo()
sc.stop()
}
5:优化为implicit on
需求:现在比方说要按照价格升序,如果价格相同,按照数量降序
implicit on的公式如下:
implicit val ord = Ordering[排序规则数据类型].on[数据的类型](x => 排序规则)
import com.bigdata.spark.utils.ImplicitAspect._
object SortApp01 {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("my-spark")
val sc = new SparkContext(sparkConf)
val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50",
"iphone7 2000 100", "iphone11 5000 50"))
val product = rdd.map(x => {
//按照空格拆分
val split = x.split(" ")
val name = split(0)
val price = split(1).toDouble
val amount = split(2).toInt
(name, price, amount)
})
/**
*
*
* x._2, -x._3排序规则
* (Double, Int)定义的是规矩的返回值类型
* (String, Double, Int) 数据的类型
*/
implicit val ord = Ordering[(Double, Int)].on[(String, Double, Int)](x => (x._2, -x._3))
product.sortBy(x => x).printInfo()
sc.stop()
}
}