1、解压文件
解压spark压缩文件到/export/server/目录下(我的是:spark-3.2.0-bin-hadoop3.2.tgz)
(pyspark) [root@node1 export]# tar -zxvf spark-3.2.0-bin-hadoop3.2.tgz -C /export/server/
2、配置软连接(也可以不配置,配置是为了输入这个文件夹是比较方便)
(pyspark) [root@node1 server]# ln -s /export/server/spark-3.2.0-bin-hadoop3.2 /export/server/spark
3、配置系统环境变量
(pyspark) [root@node1 server]# vim /etc/profile
插入以下内容
export JAVA_HOME=/export/server/jdk1.8.0_241
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/export/server/hadoop-3.3.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export SPARK_HOME=/export/server/spark
export PYSPARK_PYTHON=/export/server/anaconda3/envs/pyspark/bin/python3.8
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin
4、配置bashrc环境变量
(pyspark) [root@node1 server]# vim ~/.bashrc
插入以下内容:
export JAVA_HOME=/export/server/jdk1.8.0_241
export PYSPARK_PYTHON=/export/server/anaconda3/envs/pyspark/bin/python3.8
5、启动测试Pyspark(/export/server/spark/bin目录下执行)
(pyspark) [root@node1 bin]# ./pyspark
出现以上界面说明启动成功