pyspark Python3.7环境设置 及py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe解决!
环境设置
JDK: java version "1.8.0_66"
Python 3.7
spark-2.3.1-bin-hadoop2.7.tgz
环境变量
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=ipython3
mac-abeen:spark-2.3.1-bin-hadoop2.7 abeen$ ./bin/pyspark
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 20:42:06)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.
1Using Python version 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018 20:42:06)
SparkSession available as 'spark'.
In [1]: sc
Out[1]:
In [2]: lines = sc.textFile("README.md")
In [3]: lines.count()
Out[3]: 103
In [4]: lines.first()
Out[4]: '# Apache Spark'
Py4JJavaError PythonRDD.collectAndServe解决!
注意: spark-2.3.1-bin-hadoop2.7 暂不支持java version "9.0.4". 报错请校正自己的JDK是否支持.
./bin/pyspark
>>> lines = sc.textFile("README.md")
>>> lines.count()