spark的RDD操作1

最新推荐文章于 2022-04-26 16:11:00 发布

WEI_69

最新推荐文章于 2022-04-26 16:11:00 发布

阅读量256

点赞数

分类专栏： spark hadoop 大数据技术

本文链接：https://blog.csdn.net/qq_42304949/article/details/103357794

版权

大数据技术同时被 3 个专栏收录

6 篇文章 0 订阅

订阅专栏

hadoop

5 篇文章 0 订阅

订阅专栏

spark

3 篇文章 0 订阅

订阅专栏

对spark安装目录中README.md文件做词频统计

scala> val textFile=sc.textFile("file:///usr/local/spark/README.md")
textFile: org.apache.spark.rdd.RDD[String] = file:///usr/local/spark/README.md MapPartitionsRDD[79] at textFile at <console>:31

scala> val wordcounts=textFile.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey((a,b)=>a+b)
wordcounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[82] at reduceByKey at <console>:32

scala> wordcounts.collect()
res21: Array[(String, Int)] = Array((package,1), (For,3), (Programs,1), (processing.,1), (Because,1), (The,1), (page](http://spark.apache.org/documentation.html).,1), (cluster.,1), (its,1), ([run,1), (than,1), (APIs,1), (have,1), (Try,1), (computation,1), (through,1), (several,1), (This,2), (graph,1), (Hive,2), (storage,1), (["Specifying,1), (To,2), ("yarn",1), (Once,1), (["Useful,1), (prefer,1), (SparkPi,2), (engine,1), (version,1), (file,1), (documentation,,1), (processing,,1), (the,24), (are,1), (systems.,1), (params,1), (not,1), (different,1), (refer,2), (Interactive,2), (R,,1), (given.,1), (if,4), (build,4), (when,1), (be,2), (Tests,1), (Apache,1), (thread,1), (programs,,1), (including,4), (./bin/run-example,2), (Spark.,1), (package.,1), (1000).count(),1), (Versions,1), (HDFS,1), (...
scala>

WEI_69

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spark的RDD操作1

scala> val textFile=sc.textFile("file:///usr/local/spark/README.md")textFile: org.apache.spark.rdd.RDD[String] = file:///usr/local/spark/README.md MapPartitionsRDD[79] at textFile at <console&g...
复制链接

扫一扫