1,安装jdk
2,安装scala(版本需和spark保持一致)
3,安装scala IDE for Eclipse(网站: http://scala-ide.org/ ,注意版本和spark的scala保持一致)
4,下载spark
5,打开eclipse,新建scala工程及scala类
6,导入包,\spark-1.6.1-bin-hadoop2.6\lib\spark-assembly-1.6.1-hadoop2.6.0.jar
7,将工程下的scala library 的版本修改为当前spark里面scala的版本
右键,Build Path -->Configure Build Path
然后Edit
选择合适的版本
8,跑代码,注意代码里的logFile路径要写自己的文件路径
/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
9,会报错,但还是可以跑出结果,原因是没有安装hadoop