环境:windows+pycharm+pyspark
错误一:OSError: [Errno 2] No such file or directory
Traceback (most recent call last):
File "/tmp/pycharm_project_744/work/qiedian/data_preprocessing.py", line 28, in <module>
spark = SparkSession.builder.master("yarn-client").appName("test").getOrCreate()
File "/home/fxj/miniconda2/lib/python2.7/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/home/fxj/miniconda2/lib/python2.7/site-packages/pyspark/context.py", line 367, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/home/fxj/miniconda2/lib/python2.7/site-packages/pyspark/context.py", line 133, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/home/fxj/miniconda2/lib/python2.7/site-packages/pyspark/context.py", line 316, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/home/fxj/miniconda2/lib/python2.7/site-packages/pyspark/java_gateway.py", line 46, in launch_gateway
return _launch_gateway(conf)
File "/home/fxj/miniconda2/lib/python2.7/site-packages/pyspark/java_gateway.py", line 98, in _launch_gateway
proc = Popen(command, stdin=PIPE, preexec_fn=preexec_func, env=env)
File "/home/fxj/miniconda2/lib/python2.7/subprocess.py", line 394, in __init__
errread, errwrite)
File "/home/fxj/miniconda2/lib/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
此问题的原因是pycharm中java环境未配置。
选择edit Configuration
配置JAVA_HOME
JAVA_HOME是服务器中的JAVA信息。
报错二:
Exception: java.lang.Exception: When running with master ‘yarn’ either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment. in spark
这个问题的原因是在pycharm中配置的hadoop环境路径信息定义的问题。
首先定义路径信息
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
然后在pycharm中配置一下。