使用Scala语言实现单词数量统计功能:
List( ("Hello Scala World", 4), ("Hello World", 3), ("Hello Hadoop", 2), ("Hello Hbase", 1) )
将上面集合中的单词统计出现次数并按照次数降序排列取前3.
方式一
object WordCount {
def main(args: Array[String]): Unit = {
val linesList = List(("Hello Scala World", 4), ("Hello World", 3), ("Hello Hadoop", 2), ("Hello Hbase", 1))
// 将一行一行的数据拆分成一个一个的单词数据
// ("Hello Scala World", 4)
// =>[ (Hello), (Scala), (World) ]
// =>[ (Hello,4), (Scala,4), (World,4) ]
val flatMapList: List[(String, Int)] = linesList.flatMap(t => {
val line: String = t._1
val words = line.split(" ")
words.map(w => (w, t._2))
})
// 将单词进行分组
// Hello -> List((Hello,4), (Hello,3), (Hello,2), (Hello,1))
// ==> List((4), (3), (2), (1))
// ==> list.sum
val groupWordMap: Map[String, List[(String, Int)]] = flatMapList.groupBy(t=>t._1)
// 将分组后的数据进行结构的转换
/*
val wordToSumMap: Map[String, Int] = groupWordMap.map(t => {
val countList: List[Int] = t._2.map(tt => tt._2)
(t._1, countList.sum)
})
*/
// mapValues方法可以只针对于Map集合中的Value做操作,key保持不变
val wordToSumMap: Map[String, Int] = groupWordMap.mapValues(datas=>datas.map(tt=>tt._2).sum)
// 将统计的结果进行降序排列
val sortList: List[(String, Int)] = wordToSumMap.toList.sortWith( (left, right)=> left._2 > right._2 )
// 从排序后的集合中获取前3条
val resultList: List[(String, Int)] = sortList.take(3)
println(resultList)
}
}
方式二
object WordCount {
def main(args: Array[String]): Unit = {
//原始数据 ("Hello Scala World", 4) 4:代表出现的次数
val tuples: List[(String, Int)] = List( ("Hello Scala World", 4), ("Hello World", 3), ("Hello Hadoop", 2), ("Hello Hbase", 1) )
//将("Hello Scala World", 4)还原,"Hello Scala World"*4
//List(Hello Scala World Hello Scala World Hello Scala World Hello Scala World , Hello World Hello World Hello World )
val strings: List[String] = tuples.map(t=>{(t._1+" ")*t._2})
println(strings)
//进行扁平化,将集合中的字符串按空格进行切分
val flatMapList: List[String] = strings.flatMap(t=>{t.split(" ")})
println(flatMapList)
//进行分组
val stringToStrings: Map[String, List[String]] = flatMapList.groupBy(words=>words)
println(stringToStrings)
//进行统计 (Hello,4), (Scala,4), (World,4)
val stringToInt: Map[String, Int] = stringToStrings.map(t=>{(t._1,t._2.size)})
println(stringToInt)
//进行排序(倒序)
println(stringToInt.toList)
val sortWithList: List[(String, Int)] = stringToInt.toList.toList.sortWith((left, right) => {
left._2 > right._2
})
println(sortWithList)
//获取前3的数据
val result: List[(String, Int)] = sortWithList.take(3)
println(result)
}
}
博客介绍了使用Scala语言实现单词数量统计功能。给出一个包含字符串及对应数字的集合,需统计集合中单词出现次数,再按次数降序排列并取前3,还提及了两种实现方式。
2679

被折叠的 条评论
为什么被折叠?



