需求:
单词计数,将集合中出现的相同的单词,进行计数,取计数排名前三的结果
代码实现:
val tupleList1 = List(("Hello Scala Spark World ", 4), ("Hello Scala Spark", 3), ("Hello Scala", 2), ("Hello", 1))
//0).将元组(字符串,次数) 进行转换为一个大的字符串
val newList: List[String] = tupleList1.map(kv => (kv._1.trim + " ") * kv._2)
println("Step0(转字符串): " + newList)
//1).扁平映射
val wordList: List[String] = newList.flatMap(_.split(" "))
println("Step1(扁平化): " + wordList)
//2).将相同的单词放到一组 Map(Hello -> List(Hello, Hello, Hello, Hello))
val groupList: Map[String, List[String]] = wordList.groupBy(elem => elem)
println("Step2(分组): " + groupList)
//3).对分组后map集合中的内容进行结构的转换 Map(Hello->4)
//注意:map里面的函数参数是一个元素,不要误认为是两个参数
val countList: Map[String, Int] = groupList.map(kv => {(kv._1,kv._2.size)})
println("Step3(单词计数): " + countList)
//4).转换成list List((Hello,4), (Hbase,2), (kafka,1), (Scala,3))
val tupleList: List[(String, Int)] = countList.toList //变成list
println("Step4(转元组): " + tupleList)
//5).排序 取前3
//val sortList: List[(String, Int)] = tupleList.sortBy(_._2).reverse.take(3)
val sortList: List[(String, Int)] = tupleList.sortWith(_._2 > _._2).take(3)
println("Step5(排序,取值): " + sortList)
简写:
val wordCountList: List[(String, Int)] = tupleList1
.map(tup => (tup._1.trim + " ") * tup._2)
.flatMap(_.split(" "))
.groupBy(elem => elem)
.map(tup => (tup._1, tup._2.size))
.toList
.sortBy(tup => tup._2)
.reverse
.take(3)
println(wordCountList)
打印信息:
Step0(转字符串): List(Hello Scala Spark World Hello Scala Spark World Hello Scala Spark World Hello Scala Spark World , Hello Scala Spark Hello Scala Spark Hello Scala Spark , Hello Scala Hello Scala , Hello )
Step1(扁平化): List(Hello, Scala, Spark, World, Hello, Scala, Spark, World, Hello, Scala, Spark, World, Hello, Scala, Spark, World, Hello, Scala, Spark, Hello, Scala, Spark, Hello, Scala, Spark, Hello, Scala, Hello, Scala, Hello)
Step2(分组): Map(Hello -> List(Hello, Hello, Hello, Hello, Hello, Hello, Hello, Hello, Hello, Hello), Spark -> List(Spark, Spark, Spark, Spark, Spark, Spark, Spark), Scala -> List(Scala, Scala, Scala, Scala, Scala, Scala, Scala, Scala, Scala), World -> List(World, World, World, World))
Step3(单词计数): Map(Hello -> 10, Spark -> 7, Scala -> 9, World -> 4)
Step4(转元组): List((Hello,10), (Spark,7), (Scala,9), (World,4))
Step5(排序,取值): List((Hello,10), (Scala,9), (Spark,7))