1、安装软件
1、python 3.10
2、hadoop-3.3.4 里面的winutils 要记得添加
3、java-17
4、spark-3.5.1-bin-hadoop3
python 安装 pyspark,Jupyter notebook
pip install pyspark
pip install jupyter notebook
2、添加环境变量
- JAVA_HOME=C:\PySparkService\java-17
- HADOOP_HOME=C:\PySparkService\hadoop-3.3.4
- SPARK_HOME=C:\PySparkService\spark-3.5.1-bin-hadoop3
- %JAVA_HOME%\bin
- %HADOOP_HOME%\bin
- %SPARK_HOME%\bin
下面环境不配置会报错
PYSPARK_PYTHON=python
#jupyter notebook 启动 pyspark
# 自己安装 jupyter notebook 使用下面环境变量
PYSPARK_DRIVER_PYTHON=jupyter
<