1.安装好JDK,SCALA后(注意版本兼容问题),下载兼容的Spark
2.解压缩spark-2.1.0-bin-hadoop2.7.tgz
,移动到自己想要放的地方(例如/opt)
3.配置环境变量:
sudo gedit /etc/profile
export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7.tgz
export PATH=$SPARK_HOME/bin:$PATH
source /etc/profile
4.进入conf文件夹:cd /opt/spark-2.1.0-bin-hadoop2.7.tgz/conf
复制一份spark-env.sh.template
,命名为spark-env.sh
:
cp spark-env.sh.template spark-env.sh
添加两行配置:
export SPARK_MASTER_IP=127.0.0.1
export SPARK_LOCAL_IP=127.0.0.1
5.进入bin目录并运行spark-shell,成功:
cd /opt/spark-2.1.0-bin-hadoop2.7.tgz/bin
./spark-shell
测试:
WordCount:
val counts = sc.textFile("../README.md").flatMap(line => line.split(" ")).
map(word => (word, 1)).reduceByKey(_+_).foreach(println)
成功