scala语言实现wordcount

最新推荐文章于 2024-03-28 21:40:19 发布

落花流水i

最新推荐文章于 2024-03-28 21:40:19 发布

阅读量300

点赞数

分类专栏： scala

本文链接：https://blog.csdn.net/weixin_44080445/article/details/109501445

版权

scala 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

object wordCount{
  def main(args: Array[String]): Unit = {
    val str = List("hadoop hive hadoop","hive hello mysql pig hello hadoop")
    val res1 = str.flatMap((s:String)=>s.split(" "))//1.按空格来切分单词
    //res1= List(hadoop, hive, hadoop, hive, hello, mysql, pig, hello, hadoop)

    val res2 = res1.map((x:String)=>((x:String),1)) //2.将每个切分后的元素创建成对偶元祖的形式(K,V)
    //res2= List((hadoop,1), (hive,1), (hadoop,1), (hive,1), (hello,1), (mysql,1), (pig,1), (hello,1), (hadoop,1))

    val res3 = res2.groupBy((x:(String,Int))=>(x._1))//3.将对偶元祖中的元素按不同的单词依次分组,  _.1表示元组的第1个值
    //res3= Map(hadoop -> List((hadoop,1), (hadoop,1), (hadoop,1)), hive -> List((hive,1), (hive,1)), mysql -> List((mysql,1)), hello -> List((hello,1), (hello,1)), pig -> List((pig,1)))

    val res4 = res3.toList.map((x:(String,List[(String,Int)]))=>(x._1,x._2.size)) //4.求出每个单词出现的次数,因为Map没有提供计算长度的方法，所以先转换为List，然后计算单词出现的次数
    //res4= List((hadoop,3), (hive,2), (mysql,1), (hello,2), (pig,1))

    /*  对上述代码的简化版
val res2 = res1.map((_,1))
val res3 = res2.groupBy(_._1)
val res4 = res3.toList.map((x)=>(x._1,x._2.size))
val res5 =str.flatMap(_.split(" ")).map((_,1)).groupBy(_._1).toList.map((x)=>(x._1,x._2.size))
     */

    /*  wordcount程序的简化最终版
    val res5 =str.flatMap(_.split(" ")).map((_,1)).groupBy(_._1).toList.map((x)=>(x._1,x._2.size))
    println("res5= "+res5)
     */
    for(item <- res4){
      println(item)
    }

  }
}

运行结果：

在这里插入图片描述

这里对上述scala中参数类型推断和化简写法进行一个简单的说明
1.参数类型是可以推断时，可以省略参数类型
2.当传入的函数，只有单个参数时，可以省去括号
3.如果变量只在=>右边只出现一次，可以用_来代替

对上述代码中高阶函数有不明确的可以参考一下这位博主的文章
https://blog.csdn.net/m0_38109926/article/details/108695731

落花流水i

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
scala语言实现wordcount

object wordCount{ def main(args: Array[String]): Unit = { val str = List("hadoop hive hadoop","hive hello mysql pig hello hadoop") val res1 = str.flatMap((s:String)=>s.split(" "))//1.按空格来切分单词 //res1= List(hadoop, hive, hadoop, hive, hello,
复制链接

扫一扫

专栏目录