使用pycharm连接虚拟机的python环境,编写pyspark,简单的wordcount程序,报错如下:
/export/server/anaconda3/envs/pyspark/lib/python3.8/site-packages/pyspark/bin/spark-class: line 71: export/server/jdk/bin/java: No such file or directory
/export/server/anaconda3/envs/pyspark/lib/python3.8/site-packages/pyspark/bin/spark-class: line 97: CMD: bad array subscript
Traceback (most recent call last):
File "/export/pythonfile/hello_WordCount.py", line 10, in <module>
sc = SparkContext(conf=conf)
File "/export/server/anaconda3/envs/pyspark/lib/python3.8/site-packages/pyspark/context.py", line 201, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/export/server/anaconda3/envs/pyspark/lib/python3.8/site-packages/pyspark/context.py", line 436, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/export/server/anaconda3/envs/pyspark/lib/python3.8/site-packages/pyspark/java_gateway.py", line 107, in launch_gateway
raise PySparkRuntimeError(
pyspark.errors.exceptions.base.PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number.
解决方法如下:
window系统下和linux下的java环境冲突了,在python代码指定java的环境变量即可
import os
os.environ['JAVA_HOME'] = "/usr/local/jdk1.8" # 改成自己的jdk路径
运行成功: