Exception:Python in worker has different version 3.6 than that in driver 2.7,PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set
from __future__ import print_function
from pyspark import *
import os
os.environ['PYSPARK_PYTHON'] = 'C:/Python27/python2.exe'
if __name__ == '__main__':
# sc = SparkContext("local")
sc = SparkContext(appName='first App')
rdd = sc.parallelize("hello PySpark world".split(' '))
counts = rdd.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a + b)
# counts.saveAsTextFile('F:/out')
counts.foreach(print)
sc.stop()
出现这样的问题,是因为,pyspark的python环境与 driver也就是主节点的python环境版本不一致
本地安装了py2 以及py3两个环境, 通过在程序中指定 PYSPARK_PYTHON 的方式:在pycharm运行正常,可是通过 spark-sumit 方式提交仍然出现这样的问题,因默认为py2环境, 而pycharm指定环境为py3,所以出现上述情况,将pycharm Python interpret 指定为py2,正常;
想通过win10设置默认python的方式,实现spark-submit,修改环境变量path中python的地址即可