拿一个统计单词为例
sortBy:
sortBy可以定义排序方式
object sortByTest{
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("reduceTest")
val sc = new SparkContext(conf)
val lines: RDD[String] = sc.parallelize(Array("hello java","hello spark","hello scala"))
val word: RDD[String] = lines.flatMap(_.split(" "))
val A: RDD[(String, Int)] = word.map((_,1))
val B: RDD[(String, Int)] = A.reduceByKey(_+_)
val C: RDD[(String, Int)] = B.sortBy(_._2,false)
C.foreach(println)
}
}
这里我们用_._2,意思是根据value值来排序,比如(hello,1),根据这个键值对来用1排序,后面的false表示降序排列,不加默认升序排列。
sortByKey:
它会根据key来进行排序,我们来看代码
object sortByKeyTest{
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("reduceTest")
val sc = new SparkContext(conf)
val lines: RDD[String] = sc.parallelize(Array("hello java","hello spark","hello scala"))
val word: RDD[String] = lines.flatMap(_.split(" "))
val A: RDD[(String, Int)] = word.map((_,1))
val B: RDD[(String, Int)] = A.reduceByKey(_+_)
val C: RDD[(String, Int)] = B.sortByKey()
C.foreach(println)
}
}
会根据键值对的key进行排序,下面看一种用swap反转用sortbykey来对value排序
object sortByKeyTest{
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("reduceTest")
val sc = new SparkContext(conf)
val lines: RDD[String] = sc.parallelize(Array("hello java","hello spark","hello scala"))
val word: RDD[String] = lines.flatMap(_.split(" "))
val A: RDD[(String, Int)] = word.map((_,1))
val B: RDD[(String, Int)] = A.reduceByKey(_+_)
val C: RDD[(Int, String)] = B.map(_.swap)
val D: RDD[(Int, String)] = C.sortByKey(false)
val E: RDD[(String, Int)] = D.map(_.swap)
E.foreach(println)
}
}
调用swap方法将键值对的key,value反转,变成(3,hello),然后在进行sortbykey,排完序以后再swap反转