scala中的排序sortBy和map例子

56 篇文章 4 订阅

在这里插入图片描述API接口点击

一直不会用sortBy,都是用sortWith,今天偶然看到一个sortBy的写法,觉得很高级,分享一下

var ll = List[(String, Int, Int)](("a",1, 400),("b",3, 600),("m",3, 100),("c",2, 40))
println(ll)

var a  = ll.map{case Tuple3(name:String, age:Int, salary:Int)=>(name,age, salary)}.sortBy{case(_,age,_)=>(age)}
println("a==",a)
var aa  = ll.map{case Tuple3(name:String, age:Int, salary:Int)=>(name,age, salary)}.sortBy{case(_,age,_)=>(-age)}
println("aa==",aa)
var aaa = ll.map{case Tuple3(name:String, age:Int, salary:Int)=>(name,age, salary)}.sortBy{case(_,age,salary)=>(age, salary)}
println("aaa==",aaa)

var b = ll.map{case Tuple3(name:String, age:Int, salary:Int)=>(name,age, salary)}.sortBy{case(_,age,salary)=>(age, salary)}.map{case (name,_,_) => name}

println("bb==",b)
var bb = ll.map{case Tuple3(name:String, age:Int, salary:Int)=>(name,age, salary)}.sortBy{case(_,age,salary)=>(age, salary)}
  .map{case (name,_,salary) => (name,salary)}

println("bb==",bb)

将Tuple中的3个变量分别取名为 name age 和salary

ll.map{case Tuple3(name:String, age:Int, salary:Int)=>(name,age, salary)}

利用某个字段排序,默认正序,从小到大,其中case中变量必须与总变量数量相同,如果不需要,可用"_"省略,但不能不写

.sortBy{case(,age,)=>(age)}

倒序只需要在age前加"-"

.sortBy{case(,age,)=>(-age)}

再用map筛选掉不需要的字段

.map{case (name,,) => name}

参考链接:Scala – 通过第一个元素反向排序元组 | scala神奇的sortBy方法

原例子:来源于 王吉 SparrowRecSyc

  def processItemSequence(sparkSession: SparkSession, rawSampleDataPath: String): RDD[Seq[String]] ={

    //path of rating data
    val ratingsResourcesPath = this.getClass.getResource(rawSampleDataPath)
    val ratingSamples = sparkSession.read.format("csv").option("header", "true").load(ratingsResourcesPath.getPath)

    //sort by timestamp udf
    val sortUdf: UserDefinedFunction = udf((rows: Seq[Row]) => {
      rows.map { case Row(movieId: String, timestamp: String) => (movieId, timestamp) }
        .sortBy { case (_, timestamp) => timestamp }
        .map { case (movieId, _) => movieId }
    })

    ratingSamples.printSchema()

    //process rating data then generate rating movie sequence data
    val userSeq = ratingSamples
      .where(col("rating") >= 3.5)
      .groupBy("userId")
      .agg(sortUdf(collect_list(struct("movieId", "timestamp"))) as "movieIds")
      .withColumn("movieIdStr", array_join(col("movieIds"), " "))

    userSeq.select("userId", "movieIdStr").show(10, truncate = false)
    userSeq.select("movieIdStr").rdd.map(r => r.getAs[String]("movieIdStr").split(" ").toSeq)
  }


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值