1 问题说明
spark安装在远程服务器上,想采用pycharm连接远程服务器运行spark,先写了一个简单样例如下
from pyspark import SparkContext
sc = SparkContext(master='local', appName='first app')
却抛出异常raise KeyError(key) from None KeyError: ‘SPARK_HOME’,详细日志如下:
ssh://geosot@162.105.17.84:22/home/geosot/software/anaconda3/envs/python35/bin/python3.5 -u /home/geosot/cm/网络大数据管理与应用/project/gmmProj/main.py
Traceback (most recent call last):
File "/home/geosot/cm/网络大数据管理与应用/project/gmmProj/main.py", line 11, in <module>
sc = SparkContext(master='local', appName='first app', sparkHome='/usr/local/spark-1.6.2-bin-hadoop2.6')
File "/home/geosot/software/anaconda3/envs/python35/lib/python3.5/site-packages/pyspark/context.py", line 112, in __init__
SparkContext._ensure_initialized(self, gateway=gateway)
File "/home/geosot/software/anaconda3/envs/python35/lib/python3.5/site-packages/pyspark/context.py", line 245, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "/home/geosot/software/anaconda3/envs/python35/lib/python3.5/site-packages/pyspark/java_gateway.py", line 48, in launch_gateway
SPARK_HOME = os.environ["SPARK_HOME"]
File "/home/geosot/software/anaconda3/envs/python35/lib/python3.5/os.py", line 725, in __getitem__
raise KeyError(key) from None
KeyError: 'SPARK_HOME'
Process finished with exit code 1
2 解决方案
这是远程连接的服务器,无法读取到环境变量SPARK_HOME导致的,也尝试过在SparkContext里面传入sparkHome参数如下,但是不能解决问题。
sc = SparkContext(master='local', appName='first app', sparkHome='/usr/local/spark-1.6.2-bin-hadoop2.6')
最终通过os,把SPARK_HOME配置好,可以解决问题,如下:
from pyspark import SparkContext
import os
os.environ['SPARK_HOME'] = '/usr/local/spark-1.6.2-bin-hadoop2.6'
sc = SparkContext(master='local', appName='first app', sparkHome='/usr/local/spark-1.6.2-bin-hadoop2.6')