1.下载spark
1.1进入Apache spark 下载页面 https://archive.apache.org/dist/spark/
选择需要的版本号
以2.2.0为例,由于已经安装过hadoop、所以我们下载hadoop-2.6版本的spark
1.2需要安装的环境
JDK 1.8.0
hadoop 2.6.0
scala 2.11.0
spark 2.2.0
注意:从2.0版开始,Spark默认使用Scala 2.11构建。Scala 2.10用户应该下载Spark源包并使用Scala 2.10支持构建 。
2.解压
将spark-2.2.0-bin-hadoop2.6.tgz上传到虚拟机,并解压
tar -zxvf spark-2.2.0-bin-hadoop2.6.tgz -C apps/
3.配置
3.1 配置环境变量
vi /etc/profile
添加
export SCALA_HOME=/root/apps/scala-2.12.7
export PATH=$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin
3.2 配置spark
进入spark配置文件夹 cd /home/hadoop/apps/spark/conf
cp spark-env.sh.template spark-env.sh
cp slaves.template slaves
cp spark-defaults.conf.template spark-defaults.conf
修改环境变量
vi spark-env.sh
JAVA_HOME=/root/apps/jdk1.8.0_171
SCALA_HOME=/root/apps/scala-2.12.7
HADOOP_HOME=/home/hadoop/apps/hadoop-2.5.0
HADOOP_CONF_DIR=/home/hadoop/apps/hadoop-2.5.0/etc/hadoop
SPARK_MASTER_IP=mini1
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8080
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=2g #spark里许多用到内存的地方默认1g 2g 这里最好设置大与1g
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=8081
SPARK_WORKER_INSTANCES=1
修改副节点
vi slaves
mini2
mini3
修改默认配置
vi spark-defaults.conf
spark.master spark://mini1:7077
4.分发并启动
4.1分发
将spark复制到其他文件,同时复制一下环境变量
scp -r spark/ mini2:/home/hadoop/apps/
scp -r spark/ mini3:/home/hadoop/apps/
sudo scp -r /etc/profile mini2:/etc/profile
sudo scp -r /etc/profile mini3:/etc/profile
刷新一下环境
source /etc/profile
4.2 启动
1.启动hadoop
sh start-ha.sh
#!/bin/bash
echo "-----------starting running zookeeper-----"
for hostname in mini1 mini2 mini3
do
ssh $hostname "source /etc/profile;/home/hadoop/apps/zookeeper/zookeeper-3.4.10/bin/zkServer.sh start"
echo "$hostname zk is running"
done
for hostname in mini1 mini2 mini3
do
ssh $hostname "source /etc/profile;/home/hadoop/apps/hadoop-2.5.0/sbin/hadoop-daemon.sh start journalnode"
echo "$hostname journalnode is running"
done
ssh mini1 "source /etc/profile;/home/hadoop/apps/hadoop-2.5.0/sbin/start-dfs.sh"
echo "mini1 start-dfs"
ssh mini1 "source /etc/profile;/home/hadoop/apps/hadoop-2.5.0/sbin/start-yarn.sh"
echo "mini1 start-yarn"
jps查看状态
查看http://mini1:50070或者http://mini2:50070
2.启动spark
cd apps/spark/
sbin/start-all.sh
jps查看进程
访问http://mini1:8080