文章目录
一、spark-shell模式
注:单词文件上传至hdfs,参考博文:
https://blog.csdn.net/u010916338/article/details/81102346?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522158946937419724835823007%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=158946937419724835823007&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2blogfirst_rank_v2~rank_v25-1-81102346.nonecase&utm_term=hdfs%E5%91%BD%E4%BB%A4
var file = sc.textFile("hdfs:/tmp/input.txt")
var rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
rdd.collect()
rdd.foreach(println)
二、jar包方式
2.1 创建Scala项目
参考博文:
https://blog.csdn.net/u010916338/article/details/81097439?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522158947247719724811847084%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=158947247719724811847084&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2blogfirst_rank_v2~rank_v25-9-81097439.nonecase&utm_term=Scala
2.2 Scala代码
package cn
import org.apache.spark.{SparkConf, SparkContext}
/**
* Hello world!
*
*/
object WordCount {
def main(args: Array[String]): Unit = {
val conf = new SparkConf();
conf.setAppName("SparkDemo1") //创建应用程序first
conf.setMaster("spark://big07:7077")
//conf.set("spark.shuffle.manager","hash")
//基于配置生成sc
val sc = new SparkContext(conf);
//基于sc开发spark代码
val rdd1 = sc.textFile("/tmp/test/word", 2);
val rdd2 = rdd1.map((_, 1)).groupBy(_._1).mapValues(_.map(_._2).reduce(_ + _));
//将结果写出
rdd2.saveAsTextFile("/tmp/test/results");
println("Hello World!")
}
}
2.3 Scala打包
参考博文:
https://blog.csdn.net/u010916338/article/details/89766059?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522158947003219195162522413%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=158947003219195162522413&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2blogfirst_rank_v2~rank_v25-2-89766059.nonecase&utm_term=idea%E6%89%93%E5%8C%85
2.4 提交jar包到spark执行
spark-submit --class cn.WordCount word.jar
2.5 sparkUI查看代码运行情况
2.6 下载hdfs文件至Linux本地
hdfs dfs -get /tmp/test/results
2.7 查看文件