参考文章:https://www.jb51.net/article/163020.htm
一、环境准备
1、Jdk1.8(windows版本)
2、scala(windows版本)
3、spark安装包(spark-2.4.7-bin-hadoop2.7)
4、hadoop安装包(hadoop-2.7.1)
二、搭建
1、安装jdk并配置环境变量;
2、安装scala并配置环境变量;
3、解压spark安装包并配置环境变量;
4、解压hadoop安装包并配置环境变量;
5、在cmd输入spark-shell
三、遇到的问题
1、cmd执行spark-shell出现如下错误:
java.net.BindException: Can't assign requested address: Service 'sparkDriver' failed after 16 retries!
解决办法:
(1) ipconfig命令查出本机ip;
(2)在etc/hosts中添加配置:192.168.xxx.xxx spark.driver.host
2、cmd执行spark-shell出现如下错误:
Missing Python executable 'python3', defaulting to 'C:\Users\lenovo\AppData\Local\Programs\Python\Python39\Scripts\..' for SPARK_HOME environment variable. Please install Python or specify the correct Python executable in PYSPARK_DRIVER_PYTHON or PYSPARK_PYTHON environment variable to detect SPARK_HOME safely.
解决办法:
在配置环境变量时要先配置SPARK_HOME,再配置PATH;
不能直接PATH一步到位bin目录下;