一、安装
解压安装文件并移动到安装目录
tar -zxvf scala-2.13.0.tgz
tar -zxvf spark-2.4.3-bin-hadoop2.7.tgz
sudo mv scala-2.13.0 /usr/local/spark/scala-2.13.0
sudo mv spark-2.4.3-bin-hadoop2.7 /usr/local/spark/spark-2.4.3-bin-hadoop2.7
二、配置
1、配置环境变量
export SCALA_HOME=/usr/local/spark/scala-2.13.0
export PATH=$SCALA_HOME/bin:$PATH
export SPARK_HOME=/usr/local/spark/spark-2.4.3-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH
2、配置spark-env.sh
export JAVA_HOME=/usr/local/jdk1.8
export SCALA_HOME=/usr/local/spark/scala-2.13.0
export HADOOP_HOME=/usr/local/hadoop/hadoop-3.1.2
export HADOOP_CONF_DIR=/usr/local/hadoop/hadoop-3.1.2/etc/hadoop
export SPARK_MASTER_IP=hadoop-master
export SPARK_WORKER_MEMORY=1g
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
3、变量说明
JAVA_HOME:Java安装目录
SCALA_HOME:Scala安装目录
HADOOP_HOME:hadoop安装目录
HADOOP_CONF_DIR:hadoop集群的配置文件的目录
SPARK_MASTER_IP:spark集群的Master节点的ip地址
SPARK_WORKER_MEMORY:每个worker节点能够最大分配给exectors的内存大小
SPARK_WORKER_CORES:每个worker节点所占有的CPU核数目
SPARK_WORKER_INSTANCES:每台机器上开启的worker节点的数目
4、配置slaves
hadoop-slave0
hadoop-slave0
5、同步配置
scp /etc/profile hadoop-slave0:/etc
scp /etc/profile hadoop-slave1:/etc
scp -r /usr/local/spark/scala-2.13.0 hadoop-slave0:/usr/local/spark/scala-2.13.0
scp -r /usr/local/spark/scala-2.13.0 hadoop-slave1:/usr/local/spark/scala-2.13.0
scp -r /usr/local/spark/spark-2.4.3-bin-hadoop2.7 hadoop-slave0:/usr/local/spark/spark-2.4.3-bin-hadoop2.7
scp -r /usr/local/spark/spark-2.4.3-bin-hadoop2.7 hadoop-slave1:/usr/local/spark/spark-2.4.3-bin-hadoop2.7
三、启动
sbin目录下执行
start-all.sh
页面访问
http://hadoop-master:8080
启动spark-shell
查看执行任务
http://hadoop-master:8080