1.初始准备:
- 准备3台虚拟机,其IP分别为:
192.168.5.130 s201
192.168.5.131 s202
192.168.5.132 s203
- 需要安装好JDK,环境生效即可
- 安装好Hadoop集群
2.安装scala
在https://www.scala-lang.org/download/底部可以进行相应版本压缩包的下载
cd /usr/local
wget https://downloads.lightbend.com/scala/2.13.0/scala-2.13.0.tgz
tar -zxvf scala-2.13.0.tgz
rm scala-2.13.0.tgz
修改环境变量
vim /etc/profile
# 添加以下内容
export SCALA_HOME="/usr/local/scala-2.13.0"
export PATH=$PATH:$SCALA_HOME/bin
source /etc/profile
测试:
scala
Welcome to Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101).
Type in expressions to have them evaluated.
scala>
注意分发到其它机器上。
3.安装Spark
Spark官网:http://spark.apache.org/
cd /usr/local
wget http://mirror.bit.edu.cn/apache/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz
tar -zxvf spark-2.4.3-bin-hadoop2.7.tgz
rm spark-2.4.3-bin-hadoop2.7.tgz
设置环境变量:
vim /etc/profile
# 添加以下内容
export SPARK_HOME="/usr/local/spark-2.4.3-bin-hadoop2.7"
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
source /etc/profile
设置配置文件:
cd spark-2.4.3-bin-hadoop2.7
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
# 添加以下内容(这个配置文件也可以暂时不配置)
export JAVA_HOME="/opt/java/jdk1.8.0_141"
export HADOOP_CONF_DIR="/usr/local/hadoop-2.7.7/etc/hadoop"
export SCALA_HOME="/usr/local/scala-2.13.0"
export SPARK_MASTER_IP=s201
export SPARK_WORKER_CORES=2
export SPARK_WORKER_MEMORY=2g
cp slaves.template slaves
# 修改为以下内容
s201
s202
s203
把core-site.xml hdfs-site.xml hive-site.xml放入到配置目录中
分发/etc/profile和spark安装包,并source /etc/profile
启动,在主节点的sbin目录中启动
sbin/start-all.sh
s201主机 jps 查看到多出一个Master,worker进程,其他两台主机查看到多出一个worker进程