前置条件:hadoop集群已安装并成功启动
(主hadoop0,从hadoop1,从hadoop2三台机器)
hadoop集群安装参考:https://blog.csdn.net/qq_44734154/article/details/125157180
一、安装Scala
下载scala安装包:https://www.scala-lang.org/download/3.1.2.html
上传scala3-3.1.2.tar.gz安装包到/usr/local/scala下面解压:
cd /usr/local/scala && tar -zxvf scala3-3.1.2.tar.gz
添加环境变量/etc/profile:
export SPARK_HOME=/usr/local/spark/spark-3.2.1-bin-hadoop3.2
export PATH=$SPARK_HOME/bin:$HBASE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$PATH
使用环境变量生效:
source /etc/profile
hadoop0,hadoop1,hadoop2都按照上述进行配置,可以配置一台,然后复制到另外两台
二、修改Spark主机hadoop0配置文件
在hadoop0上执行:
[root@hadoop0 conf]# cd $SPARK_HOME/conf
[root@hadoop0 conf]# cp workers.template workers
[root@hadoop0 conf]# cp spark-defaults.conf.template spark-defaults.conf
[root@hadoop0 conf]# cp spark-env.sh.template spark-env.sh
修改works,删除localhost,添加hadoop1,hadoop2
修改spark-env.sh,添加以下内容(注释中有说明默认值,可根据自己的情况决定是否更改)
export JAVA_HOME=/usr/java/jdk1.8.0_221-amd64
export SCALA_HOME=/usr/local/scala/scala3-3.1.2
export HADOOP_HOME=/usr/local/hadoop/hadoop-3.3.3
export HADOOP_CONF_DIR=/usr/local/hadoop/hadoop-3.3.3/etc/hadoop
export SPARK_MASTER_HOST=hadoop0
export SPARK_PID_DIR=/usr/local/spark/spark-3.2.1-bin-hadoop3.2/data/pid
export SPARK_LOCAL_DIRS=/usr/local/spark/spark-3.2.1-bin-hadoop3.2/data/spark_shuffle
export SPARK_EXECUTOR_MEMORY=500m
export SPARK_WORKER_MEMORY=4g
修改vim spark-defaults.conf文件,复制样例并修改:
spark.master spark://hadoop0:7077
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop0:9000/eventLog
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 2g
spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.eventLog.dir目录如果配置了hdfs的文件目录,需要在hdfs上面创建对应的文件夹:
[root@hadoop0 conf]# hadoop fs -mkdir /eventLog
修改/root/.bashrc添加:
export JAVA_HOME=/usr/java/jdk1.8.0_221-amd64
三、将hadoop0主机的配置复制到从机hadoop1和hadoop2上
scp -r $SPARK_HOME/conf hadoop1:/$SPARK_HOME/
scp -r $SPARK_HOME/conf hadoop2:/$SPARK_HOME/
scp /root/.bashrc hadoop1://root/
scp /root/.bashrc hadoop2://root/
四、启动和验证
先看看hadoop集群是否正常启动:
如果没有启动:执行start-all.sh启动hadoop
启动spark:
bash $SPARK_HOME/sbin/start-all.sh
查看spark启动情况:
执行示例程序SparkPi:
run-example SparkPi
执行shell命令,查询运行信息:
[root@hadoop0 examples]# spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2022-06-11 18:48:02,489 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://hadoop0:4040
Spark context available as 'sc' (master = spark://hadoop0:7077, app id = app-20220611184803-0001).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.2.1
/_/
Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_221)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
浏览器中打开http://hadoop0:4040和http://hadoop0:8080查询spark运行状态信息: