第一次编写spark代码遇到很多问题,记录下来
开发工具用最新的Intellj2016.3版本
新建项目网上有很多资料,在此不做描述
1.新建App.scala文件
2.修改pom.xml文件
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.2</version>
</dependency>
<properties>
<scala.version>2.11.7</scala.version>
</properties>
scala版本要和spark版本对应上
3.点击Edit configurations,修改VM options值为-Dspark.master=local,否则报
A master URL must beset in your configuration
错误,参考:http://blog.csdn.net/shenlanzifa/article/details/42679577
4.def main(args: Array[String]): Unit = {
val logFile = "file:///D:\\crawler-beans.cxml"
System.setProperty("hadoop.home.dir", "E:\\software\\linux\\hadoop-2.7.2")
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s,Lines with b: %s".format(numAs, numBs))
}
添加红色部分内容,找到对应的本地hadoop文件夹
5.下载文件:winutils.exe方到hadoop对应的bin目录下
不下载到对应的目录,报Couldnot locate executable null\bin\winutils.exe in the Hadoop binaries错误
参考:
至此第一个spark项目可以跑通了