Spark基础

最新推荐文章于 2020-12-19 21:45:46 发布

weixin_30389003

最新推荐文章于 2020-12-19 21:45:46 发布

阅读量114

点赞数

文章标签：大数据

原文链接：http://www.cnblogs.com/zhouhb/p/10363070.html

版权

1 读取本地文件

./spark-shell

scala> val textFile=sc.textFile("file:///home/hadoop/wordfile1.txt")
textFile: org.apache.spark.rdd.RDD[String] = file:///home/hadoop/wordfile1.txt MapPartitionsRDD[3] at textFile at <console>:24

scala> textFile.first()
res2: String = I love Spark

2 读取hdfs文件

首先要启动hdfs，然后上传文件至hdfs，才能用下面的命令读取。

scala> val textFile=sc.textFile("hdfs://localhost:9000/user/hadoop/input/wordfile1.txt")
textFile: org.apache.spark.rdd.RDD[String] = hdfs://localhost:9000/user/hadoop/input/wordfile1.txt MapPartitionsRDD[7] at textFile at <console>:24

scala> textFile.first()
res4: String = I love Spark

scala> val textFile=sc.textFile("input/wordfile1.txt")
textFile: org.apache.spark.rdd.RDD[String] = input/wordfile1.txt MapPartitionsRDD[9] at textFile at <console>:24

scala> textFile.first()
res5: String = I love Spark

scala> val textFile=sc.textFile("/user/hadoop/input/wordfile1.txt")
textFile: org.apache.spark.rdd.RDD[String] = /user/hadoop/input/wordfile1.txt MapPartitionsRDD[11] at textFile at <console>:24

scala> textFile.count()
res6: Long = 2

scala> textFile.first()
res8: String = I love Spark

3 词频统计

scala> val wordCount=textFile.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey((a,b)=>(a+b))
wordCount: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[14] at reduceByKey at <console>:26

scala> wordCount.collect()
res9: Array[(String, Int)] = Array((Spark,1), (love,2), (I,2), (Hadoop,1))

转载于:https://www.cnblogs.com/zhouhb/p/10363070.html

weixin_30389003

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark基础

1 读取本地文件./spark-shellscala> val textFile=sc.textFile("file:///home/hadoop/wordfile1.txt")textFile: org.apache.spark.rdd.RDD[String] = file:///home/hadoop/wordfile1.txt MapPartitionsRDD[3] at t...
复制链接

扫一扫

Spark基础

“相关推荐”对你有帮助么？