Scala-18：简单的WordCount案例

最新推荐文章于 2024-03-24 16:37:40 发布

牧码文

最新推荐文章于 2024-03-24 16:37:40 发布

阅读量279

点赞数 1

分类专栏： Scala

本文链接：https://blog.csdn.net/weixin_46429290/article/details/119887048

版权

Scala 专栏收录该内容

23 篇文章 1 订阅

订阅专栏

Scala-18：简单的WordCount案例

一、案例分析

对于传过来得一个数据，比如一个列表

List(
      "hello",
      "hello world",
      "hello scala",
      "hello spark from scala",
      "hello flink from scala"
    )

要统计每个单词出现的次数，输出的结果应该是

hello, 5
world, 1
scala, 3
spark, 1
flink, 1
from, 2

然后按照降序排序，取出现次数最高的三位

那么具体的实现步骤就可以是：

获取每一行的数据，对每一行的数据进行切分，获得每一个单词
```
"hello world" => hello, world
```

对每一个单词进行分组，相同的单词分到一组

Map[String, LIst[String]] = (hello => (hello,hello,hello,hello,hello))

对分组之后的list取长度，转换为单词加上出现的次数
```
Map[String, LIst[String]] => Map[(String,Int)]
```
上个步骤的数据输出为map类型，转换为list类型，按照出现的次数降序排序，然后取前三
```
Map[(String,Int)] => List[(String,Int)]
```

二、所用方法

1：转换/映射方法 map()

将集合中的每一个元素映射到某一个函数

list.map()

2：切分字符串 split()

//按照空格切分
list.map(_.split(" "))

3：扁平化处理 flatten()

list.flatten()

4：列表分组 groupBy()

//按照单词分组
list.groupBy(word => word)

5：map转为lsit toList()

map.toList

6：排序

//按照单词出现的次数排序
list.sortWith(_._2 > _._2)

三、代码实现

object CommonWorldCount {
  def main(args: Array[String]): Unit = {

    val stringList: List[String] = List(
      "hello",
      "hello world",
      "hello scala",
      "hello spark from scala",
      "hello flink from scala"
    )

    //1.对字符串进行切分，得到一个打散的列表
    val wordList1: List[Array[String]] = stringList.map(_.split(" "))
    val wordList2: List[String] = wordList1.flatten

    //2.相同的单词进行分组
    val groupMap: Map[String, List[String]] = wordList2.groupBy(word => word)

    //3.对分组之后的list取长度，得到每个单词的个数
    val countMap: Map[String, Int] = groupMap.map(kv => (kv._1, kv._2.length))

    //4.转换为list，并排序取前三
    //转换为list
    val countList: List[(String, Int)] = countMap.toList
    //排序
    val sortList: List[(String, Int)] = countList.sortWith(_._2 > _._2)
    //取前三
    val preList = sortList.take(3)
    println(preList)

    //可以简化为一步
    val preList2 = countMap.toList
      .sortWith(_._2 > _._2)
      .take(3)
  }

牧码文

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
Scala-18：简单的WordCount案例

Scala-18：简单的WordCount案例一、案例分析对于传过来得一个数据，比如一个列表List( "hello", "hello world", "hello scala", "hello spark from scala", "hello flink from scala" )要统计每个单词出现的次数，输出的结果应该是hello, 5world, 1scala, 3spark, 1flink, 1from, 2
复制链接

扫一扫