配置scala+spark教程:https://blog.csdn.net/qq_40343117/article/details/100974950
1、首先进入/usr/local/spark/
找到README.md文件上传到hadoop集群作为数据
输入
hadoop dfs -mkdir /scala
hadoop dfs -put /usr/locaal/spark/README.md /scala/
2、运行/usr/local/spark/bin/spark-shell
输入./bin/spark-shell
运行进入spark命令行
3、输入命令实现wordcount
输入
val textFile=sc.textFile("hdfs://h01:9000/scala/README.md")
val wordCounts = textFile.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey((a,b)=>a+b)
wordCounts.foreach(println)
这里的代码是读取集群的文件,按照设定的格式存入一个集合,然后遍历输出,下面我们画图详细解释一下:
成功显示: