一、下载Spark
首先我们需要下载Spark安装包,下载地址:http://spark.apache.org/downloads.html,以spark-2.0.2版本为例
二、安装
1、将下载的Spark安装包放到/usr/local目录,解压缩安装包
tar zxvf spark-2.0.2-bin-without-hadoop.tgz
mv spark-2.0.2-bin-without-hadoop spark //重命名目录
2、修改Spark的配置文件spark-env.sh
cp spark-env.sh.template spark-env.sh
如果下载的包类型是“Pre-build with user-provided Hadoop[can use with most Hadoop distributions]”,则需要在配置文件spark-env.sh中第一行添加如下配置信息:
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
否则,不需要添加如上SPARK_DIST_CLASSPATH配置信息,spark可直接运行。
保存退出(“:wq”)
关于为什么要添加SPARK_DIST_CLASSPATH信息,可参考官网:http://spark.apache.org/docs/latest/index.html
官方原话:Spark uses Hadoop’s client libraries for HDFS and YARN. Downloads are pre-packaged for a handful of popular Hadoop versions. Users can also download a “Hadoop free” binary and run Spark with any Hadoop version by augmenting Spark’s classpath.