spark集群搭建
虚拟机配置
bigdata-hmaster 192.168.135.112 4核心 32GB
bigdata-hnode1 192.168.135.113 4核心 16GB
bigdata-hnode2 192.168.135.114 4核心 16GB
spark常用端口:
8081:主界面
18080:历史服务,该配置在配置文件中指定
hosts配置,且三台机器中master节点能够通过ssh免密登录其它两台机器
192.168.135.112 bigdata-hmaster
192.168.135.113 bigdata-hnode1
192.168.135.114 bigdata-hnode2
1、安装包这里我们下载3.3.2版本的
需要提前配置好scala环境
安装包下载连接:https://dlcdn.apache.org/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3.tgz
解压,并修改文件夹名称
tar -zxvf spark-3.3.2-bin-hadoop3.tgz
mv spark-3.3.2-bin-hadoop3 spark-3.3.2
2、修改配置文件workers、spark-env.sh、spark-defaults.conf
cp workers.template workers
cp spark-env.sh.template spark-env.sh
cp spark-defaults.conf.template spark-defaults.conf
workers
bigdata-hnode1
bigdata-hnode2
spark-env.sh
export JAVA_HOME=/usr/local/lib/jdk1.8.0_333
export HADOOP_HOME=/usr/local/lib/hadoop-3.2.4
export HADOOP_CONF_DIR=/usr/local/lib/hadoop-3.2.4/etc/hadoop
export SPARK_CLASSPATH=/usr/local/lib/spark-3.3.2
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.fs.logDirectory=hdfs://bigdata-hmaster:8020/spark/sparklog -Dspark.history.retainedApplications=30"
export SPARK_EXECUTOR_CORES=1
export SPARK_EXECUTOR_MEMORY=1G
spark-defaults.conf
spark.master spark://bigdata-hmaster:7077
spark.eventLog.enabled true
spark.eventLog.dir hdfs://bigdata-hmaster:8020/spark/sparklog
spark.yarn.jars hdfs://bigdata-hmaster:8020/spark/sparkjar/*
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.cores 2
spark.driver.memory 2g
spark.cores.max 4
spark.yarn.historyServer.address=bigdata-hmaster:18080
spark.history.ui.port=18080
spark.executor.extraJavaOptions -XX:+PrintGCDetails
配置分发:
scp -r conf bigdata-hnode1:$PWD
scp -r conf bigdata-hnode2:$PWD
配置三台机器的环境变量
export SPARK_HOME=/usr/local/lib/spark-3.3.2
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin:$SPARK_HOME/yarn
source /etc/profile
3、修改spark的启动命令
cd $SPARK_HOME/sbin
mv start-all.sh spark-start-all.sh
mv stop-all.sh spark-stop-all.sh
启动spark服务和历史服务。
由于上边spark-defaults.conf中配置了spark.eventLog.dir和spark.yarn.jars ,需要启动hdfs并创建相应的目录
spark-start-all.sh
start-history-server.sh
spark的主页访问master的:8081
spark历史记录访问master的:18080