需要提前准备Hadoop的集群和是spark集群!!!
WordCount在本地运行
package com.ect.scala
import org.apache.spark.{SparkConf, SparkContext}
object WordCountScala {
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
.setAppName("WordCount")
.setMaster("local")
val sc = new SparkContext(conf)
val lines = sc.textFile("f:/all/a.txt")
val words = lines.flatMap{line => line.split( ",")}
val paris = words.map{word => (word,1)}
val wordCount = paris.reduceByKey(_+_)
wordCount.foreach(wordCount=>println(wordCount._1 + " appeared " + wordCount._2 + " times . " ))
sc.stop()
}
}
WordCount在linux运行
先打包在传在linux上,把备用文件上传到hdfs(Hadoop fs -put /opt/a.txt)
1)执行脚本vim a.txt
/usr/local/spark/bin/spark-submit \
--class com.ect.scala.WordCountScala \ class后面写类的路径
--num-executors 3 \
--driver-memory 100m \
--executor-memory 100m \
--executor-cores 3 \
/opt/spark-1.0-SNAPSHOT.jar \ 包名路径
2)把备用文件上传到hdfs(Hadoop fs -put /opt/a.txt)
a.txt
hello,wangcc
hello,yaoshuai
hello,xiaoqi
hello,xiaoqi
hello,mayun
hello,mayun
hello,xjp
hello,xjp
hello,xjp
hello,boss
hello,bios
package com.ect.scala
import org.apache.spark.{SparkConf, SparkContext}
object WordCountScala {
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
.setAppName("WordCount")
.set("spark.testing.memory", "2147480000")
// .setMaster("local")
val sc = new SparkContext(conf)
val lines = sc.textFile("/a.txt")
val words = lines.flatMap{line => line.split( ",")}
val paris = words.map{word => (word,1)}
val wordCount = paris.reduceByKey(_+_)
wordCount.foreach(wordCount=>println(wordCount._1 + " appeared " + wordCount._2 + " times . " ))
sc.stop()
}
}